## Determining the best location to Open a Pet Store in Orlando, Florida
### IBM Data Science Professional Certificate

## Table of contents
* [Introduction - Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction - Business Problem <a name="introduction"></a>

The City of Orlando, Florida was given the nickname: "the City Beautiful". It is one of the most-visited cities in the world primarily driven by tourism, major events, and convention traffic; in 2018 the city drew more than 75 million visitors. The two largest and most internationally renowned tourist attractions in the Orlando area include the Walt Disney World Resort, opened by the Walt Disney Company in 1971, and located approximately 21 miles (34 km) southwest of Downtown Orlando in Bay Lake; and the Universal Orlando Resort, opened in 1990 as a major expansion of Universal Studios Florida.

#### Business Problem:
In this project we will try to find an optimal location for a Pet Store. This report will be especially useful to entrepreneurs interested in opening a pet store in Orlando, Florida.
There are many competing pet stores in the State of Florida. Therefore, we will try to detect locations that are not already crowded with pet stores, specifically around Orlando. We are also particularly interested in areas with dog parks but few pet stores in the surrounding area. Locations that are as close as possible to the city center are preferable, only if the previous two conditions are met.
In this project we will generate a few of the most promising neighborhoods based on the established criteria.

## Data <a name="data"></a>

Considering the established problem, some of the factors we will consider in decision-making are:
1. The number of existing dog parks in the neighborhood, regardless of the rating on Foursquare.
2. If any, the number of, and distance to each of the pet stores in the neighborhood.
3. The distance of each neighborhood from Orlando’s city center.

The data sources that will be needed are:
1. **Google Maps API**
2. The coordinates of Orlando City center will be obtained through the *Foursquare API*.
3. **Foursquare API** to get the location and number of pet stores, dog parks in every neighborhood.

In [1]:
import pandas as pd # For Data Analysis.
import numpy as np # Handling Data in a Vectorized manner.

import folium # For Rendering Maps.

#from IPython.display import display

import matplotlib.pyplot as plt

# The Matplotlib Plotting Modules:
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import shapely
import shapely.geometry

# Import pyproj
import pyproj

import math

# import K-Means:
from sklearn.cluster import KMeans

from geopy.geocoders import Nominatim # For Converting Addresses into Lat.& Long. values.

import json # To handle JSON files. 

import requests # Request Handling.

from pandas.io.json import json_normalize # For tranforming JSON files into a Pandas Dataframes.

print('Dependancies Imported!')

Dependancies Imported!


### Creating Potential/Prospective Neighborhoods:

We will now create the Lat.(latitude) & Long. (longitude) coordinates which will serve as the centroids of our Potential/Prospective neighborhoods. Afterwards create a grid of cells covering the area of interest which is centered around Orlando City Centre within a radius of 30 Square kilometers **(30 x 30 KM)**.

The first step will be to find the Lat. & Long. of the Orlando city center, using a specific address and Google Maps geocoding API.

#### We will choose our Central point for the Orlando City center neighbourhood is the *Orlando Centroplex* known also as **Expo Centre: Orlando Centroplex**. Let's start by converting the address to its latitude and longitude coordinates.

In [2]:
address = '355 Alexander Pl, Orlando, FL 32801, United States'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print(latitude, longitude)
Orlando_center = [latitude, longitude]
Orlando_center

[28.548825100000002, -81.38362060080814]

Now let's create a grid of area candidates, equaly spaced, centered around city center and within 30km of Orlando Centroplex. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [3]:

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Orlando City Center longitude={}, latitude={}'.format(Orlando_center[1], Orlando_center[0]))
x, y = lonlat_to_xy(Orlando_center[1], Orlando_center[0])
print('Orlando City Center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Orlando City Center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Orlando City Center longitude=-81.38362060080814, latitude=28.548825100000002
Orlando City Center UTM X=-8063197.2052386105, Y=11274064.823337393
Orlando City Center longitude=-81.38362060087599, latitude=28.548825100085956


  after removing the cwd from sys.path.
  # Remove the CWD from sys.path while we load stuff.


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [4]:
# Setting the Orlando City center coordinates in the form of Cartesian coordinates:
Orlando_center_x, Orlando_center_y = lonlat_to_xy(Orlando_center[1], Orlando_center[0]) 

# Calculating the Vertical offset for hexagonal grid cells:
k = math.sqrt(3) / 2 
x_min = Orlando_center_x - 30000
x_step = 600
y_min = Orlando_center_y - 30000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(Orlando_center_x, Orlando_center_y, x, y)
        if (distance_from_center <= 30001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), " prospective neighborhood centers' generation is now complete!")

  after removing the cwd from sys.path.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we 

73  prospective neighborhood centers' generation is now complete!


  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [5]:
map_Orlando = folium.Map(location=Orlando_center, zoom_start=13)
folium.Marker(Orlando_center, popup='Orlando Centroplex').add_to(map_Orlando)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_Orlando)
    #folium.Marker([lat, lon]).add_to(map_Orlando)
map_Orlando

OK, we now have the coordinates of centers of locations/neighborhoods to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within 30km from Orlando Centroplex. 

Let's now use Google Maps API to get approximate addresses of those locations.

In [6]:
api_key = 'AIzaSyDJfuk-rjFPiEWOx4sNfBnL2wmk6ubc3Tw'

In [7]:

def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(api_key, Orlando_center[0], Orlando_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(Orlando_center[0], Orlando_center[1], addr))

Reverse geocoding check
-----------------------
Address of [28.548825100000002, -81.38362060080814] is: None


In [8]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', United States', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [9]:
addresses[150:170]

[]

Looking good. Let's now place all this into a Pandas dataframe.

In [10]:
#import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,NO ADDRESS,28.496009,-81.246508,-8080897.0,11250060.0,29820.965779
1,NO ADDRESS,28.491619,-81.248054,-8081797.0,11250580.0,29954.773716
2,NO ADDRESS,28.494208,-81.248716,-8081197.0,11250580.0,29585.950523
3,NO ADDRESS,28.489818,-81.250261,-8082097.0,11251100.0,29738.980089
4,NO ADDRESS,28.492407,-81.250923,-8081497.0,11251100.0,29361.317013
5,NO ADDRESS,28.494997,-81.251585,-8080897.0,11251100.0,28991.152732
6,NO ADDRESS,28.485427,-81.251806,-8082997.0,11251620.0,29927.335416
7,NO ADDRESS,28.488016,-81.252468,-8082397.0,11251620.0,29533.801061
8,NO ADDRESS,28.490606,-81.25313,-8081797.0,11251620.0,29147.30528
9,NO ADDRESS,28.493195,-81.253793,-8081197.0,11251620.0,28768.131763


...and let's now save/persist this data into local file.

In [11]:
df_locations.to_pickle('./locations.pkl')    

### Foursquare API
We're interested in venues in 'dog run' and 'pet store' categories. We will include in our list only venues that have 'dog run' in category name and we will make sure to detect and include all of the 2nd category of specifically 'pet store', as we need info on pet stores in the neighborhood.

Foursquare credentials are defined in hidden cell bellow.

In [12]:
CLIENT_ID = 'GZ4I3I1HNV21DWL4MFWPBSYQJZH4DUSF2K1JEKV1V5N3LILK' #  Foursquare ID
CLIENT_SECRET = 'NRSVNZU4LQDJA0N11A0HQIQEAXEAKMDZPUZSUQRSRZJVPDUK' # Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GZ4I3I1HNV21DWL4MFWPBSYQJZH4DUSF2K1JEKV1V5N3LILK
CLIENT_SECRET:NRSVNZU4LQDJA0N11A0HQIQEAXEAKMDZPUZSUQRSRZJVPDUK


In [13]:
# The Category IDs that correspond to Pet stores were taken from Foursquare website (https://developer.foursquare.com/docs/resources/categories):

dogrun_category = '4bf58dd8d48988d1e5941735' # 'Root' category for all dogrun-related venues


petstore_categories = ['4bf58dd8d48988d100951735', '5032897c91d4c4b30a586d69', '56aa371be4b08b9a8d573508']


def is_dogrun(categories, specific_filter=None):
    dogrun_words = ['dog run', 'dog park']
    dogrun = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in dogrun_words:
            if r in category_name:
                dogrun = True
        if 'service' in category_name:
            dogrun = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            dogrun = True
    return dogrun, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Florida', '')
    address = address.replace(', United States', '')
    return address

def get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [14]:
# Let's now go over our neighborhood locations and get nearby dog runs/parks; we'll also maintain a dictionary of all found dog runs/parks and all found pet stores

import pickle

def get_dogruns(lats, lons):
    dogruns = {}
    petstores = {}
    location_dogruns = []

    print('Obtaining venues around prospective locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, dogrun_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_dogruns = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_dog, is_petstore = is_dogrun(venue_categories, specific_filter=petstore_categories)
            if is_dog:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                dogrun = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_petstore, x, y)
                if venue_distance<=300:
                    area_dogruns.append(dogrun)
                dogruns[venue_id] = dogrun
                if is_petstore:
                    petstores[venue_id] = dogrun
        location_dogruns.append(area_dogruns)
        print(' .', end='')
    print(' done.')
    return dogruns, petstores, location_dogruns

# Try to load from local file system in case we did this before
dogruns = {}
petstores = {}
location_dogruns = []
loaded = False
try:
    with open('dogruns_350.pkl', 'rb') as f:
        dogruns = pickle.load(f)
    with open('petstores_350.pkl', 'rb') as f:
        petstores = pickle.load(f)
    with open('location_dogruns_350.pkl', 'rb') as f:
        location_dogruns = pickle.load(f)
    print('dogrun data loaded.')
    loaded = True
except:
    pass

# If loading fails, use the Foursquare API to get the required data:
if not loaded:
    dogruns, petstores, location_dogruns = get_dogruns(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('dogruns_350.pkl', 'wb') as f:
        pickle.dump(dogruns, f)
    with open('petstores_350.pkl', 'wb') as f:
        pickle.dump(petstores, f)
    with open('location_dogruns_350.pkl', 'wb') as f:
        pickle.dump(location_dogruns, f)
        

dogrun data loaded.


In [15]:
# Using the imported Numpy library:
print('Total number of dogruns:', len(dogruns))
print('Total number of petstores:', len(petstores))
print('Percentage of petstores: {:.2f}%'.format(len(petstores) / len(dogruns) * 100))
print('Average number of dogruns in neighborhood:', np.array([len(r) for r in location_dogruns]).mean())

Total number of dogruns: 3
Total number of petstores: 0
Percentage of petstores: 0.00%
Average number of dogruns in neighborhood: 0.0273972602739726


In [16]:
print('List of all dogruns')
print('-----------------------')
for r in list(dogruns.values())[:10]:
    print(r)
print('...')
print('Total:', len(dogruns))

List of all dogruns
-----------------------
('4d9097975091a1cd026fbe01', 'Speed Spot 2', 28.484217524528503, -81.2502908706665, '5400-5446 S Econlockhatchee Trail (Pineleaf Wy), Orlando, FL 32829', 200, False, -8083330.618176421, 11251388.78535242)
('4d4565e1bbb1a14367d75172', 'Econ Dog Walk', 28.49655987490795, -81.25649454433085, 'Orlando, FL', 287, False, -8080336.600898355, 11251982.140873503)
('57e433f1498e98fb73b39910', 'K9 Puppies Deal', 28.48828621040672, -81.277394592762, '8156 Charlin Pkwy, Orlando, FL 32822', 335, False, -8081238.488280993, 11256464.71034519)
...
Total: 3


In [17]:
print('List of petstores')
print('---------------------------')
for r in list(petstores.values())[:10]:
    print(r)
print('...')
print('Total:', len(petstores))

List of petstores
---------------------------
...
Total: 0


In [18]:
print('Dogruns around location')
print('---------------------------')
for i in range(6, 20):
    rs = location_dogruns[i][:20]
    names = ', '.join([r[1] for r in rs])
    print('Dogruns around location {}: {}'.format(i+1, names))

Dogruns around location
---------------------------
Dogruns around location 7: Speed Spot 2
Dogruns around location 8: 
Dogruns around location 9: 
Dogruns around location 10: 
Dogruns around location 11: 
Dogruns around location 12: 
Dogruns around location 13: 
Dogruns around location 14: 
Dogruns around location 15: Econ Dog Walk
Dogruns around location 16: 
Dogruns around location 17: 
Dogruns around location 18: 
Dogruns around location 19: 
Dogruns around location 20: 


Let's now see all the dog runs/parks in our area of interest on map, and also show Pet Stores in different color.

In [19]:
map_Orlando = folium.Map(location=Orlando_center, zoom_start=13)
folium.Marker(Orlando_center, popup='Orlando Centroplex').add_to(map_Orlando)

for dog in dogruns.values():
    lat = dog[2]; lon = dog[3]
    is_petstore = dog[6]
    color = 'red' if is_petstore else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_Orlando)
map_Orlando

The progress thus far is encouraging. So now we have all the dog runs/parks in area within few kilometers of Orlando Centroplex, and we know which businesses around them are categorised as Pet Stores! We also know exactly which dogruns are in vicinity of the center of each prospective neighborhood.

The data gathering has now been completed. Now we will proceed to use this data at the **Analysis** stage to produce the report on optimal sites/locations for a new Pet Store business.

### Methodology

For this project the focus will be detecting areas around Orlando that have low density of pet stores especially thise adjascent to dog runs/parks. The analysis will be limited to area 30km around a selected central point, **Orlando Centroplex** in the city center of Orlando, Florida.

In the previous step we have collected the required location data consisting of every **pet store** within a 30km radius of the selected central point. Moreover, We have also identified every park categorised by Foursquare as a **Dog Run** within the same radius.

The next step will be the calculation and exploration of '**Dog Park density**' across different areas of Orlando, then using **heatmaps** we will identify a few promising areas close to center with high number of dog runs/parks in general as well as **no pet stores in vicinity** and focus on those areas.

In the last step we will focus on most promising areas and within those create **clusters of locations that meet the requirements** that have been established. We will take into consideration locations with **no more than two pet stores in radius of 250 meters**, and we want locations **without Pet stores in radius of 400 meters**. 
We will present map of all such locations but also create clusters, using **k-means clustering**; of those locations to identify general locations to be taken into consideration when searching for the optimal place to set up shop.

## Analysis <a name="analysis"></a>

We should perform some basic explanatory data analysis and derive some additional information from our raw data. First, will be counting the **number of dog runs** in every prospective area:

In [20]:
location_dogruns_count = [len(dog) for dog in location_dogruns]

df_locations['dogruns in area'] = location_dogruns_count

print('Average number of dogruns in every area with radius=300m:', np.array(location_dogruns_count).mean())

df_locations.head(10)

Average number of dogruns in every area with radius=300m: 0.0273972602739726


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,dogruns in area
0,NO ADDRESS,28.496009,-81.246508,-8080897.0,11250060.0,29820.965779,0
1,NO ADDRESS,28.491619,-81.248054,-8081797.0,11250580.0,29954.773716,0
2,NO ADDRESS,28.494208,-81.248716,-8081197.0,11250580.0,29585.950523,0
3,NO ADDRESS,28.489818,-81.250261,-8082097.0,11251100.0,29738.980089,0
4,NO ADDRESS,28.492407,-81.250923,-8081497.0,11251100.0,29361.317013,0
5,NO ADDRESS,28.494997,-81.251585,-8080897.0,11251100.0,28991.152732,0
6,NO ADDRESS,28.485427,-81.251806,-8082997.0,11251620.0,29927.335416,1
7,NO ADDRESS,28.488016,-81.252468,-8082397.0,11251620.0,29533.801061,0
8,NO ADDRESS,28.490606,-81.25313,-8081797.0,11251620.0,29147.30528,0
9,NO ADDRESS,28.493195,-81.253793,-8081197.0,11251620.0,28768.131763,0


The next item is to calculate the **distance to nearest Pet Store from the center of every prospective area/location.**; not only just those that are within the 300m radius, but we want the distance to the closest one, regardless of distance.

In [21]:
distances_to_petstore = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for pet in petstores.values():
        pet_x = pet[7]
        pet_y = pet[8]
        d = calc_xy_distance(area_x, area_y, pet_x, pet_y)
        if d<min_distance:
            min_distance = d
    distances_to_petstore.append(min_distance)

df_locations['Distance to petstore'] = distances_to_petstore

In [22]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,dogruns in area,Distance to petstore
0,NO ADDRESS,28.496009,-81.246508,-8080897.0,11250060.0,29820.965779,0,10000
1,NO ADDRESS,28.491619,-81.248054,-8081797.0,11250580.0,29954.773716,0,10000
2,NO ADDRESS,28.494208,-81.248716,-8081197.0,11250580.0,29585.950523,0,10000
3,NO ADDRESS,28.489818,-81.250261,-8082097.0,11251100.0,29738.980089,0,10000
4,NO ADDRESS,28.492407,-81.250923,-8081497.0,11251100.0,29361.317013,0,10000
5,NO ADDRESS,28.494997,-81.251585,-8080897.0,11251100.0,28991.152732,0,10000
6,NO ADDRESS,28.485427,-81.251806,-8082997.0,11251620.0,29927.335416,1,10000
7,NO ADDRESS,28.488016,-81.252468,-8082397.0,11251620.0,29533.801061,0,10000
8,NO ADDRESS,28.490606,-81.25313,-8081797.0,11251620.0,29147.30528,0,10000
9,NO ADDRESS,28.493195,-81.253793,-8081197.0,11251620.0,28768.131763,0,10000


In [23]:
print('Average distance to closest petstore from each area/location center:', df_locations['Distance to petstore'].mean())

Average distance to closest petstore from each area/location center: 10000.0


Looks like **on average a petstore can be found within 10000m (10km)** from the center of every prospective location/area. We need to filter our areas carefully!

Let's crete a map showing **heatmap / density of dogruns** and try to extract some meaningful information from that. Also, let's show **borders of Orlando neighbourhoods** on our map and a few circles indicating distance of 1km, 2km and 3km from **Orlando Centroplex**.

In [24]:
#Orlando_neigh_url = 'https://raw.githubusercontent.com/TheGamerCodes/Coursera_Capstone/blob/master/Orlando-neighborhoods.geojson'

#Orlando_neigh = requests.get(Orlando_neigh_url).json()

Orlando_neighJSON = 'Orlando-neighborhoods.geojson'

def neighbourhood_style(feature):
    return { 'color': 'blue', 'fill': False }

In [25]:
dogrun_latlons = [[dog[2], dog[3]] for dog in dogruns.values()]

petstore_latlons = [[pet[2], pet[3]] for pet in petstores.values()]

In [26]:
from folium import plugins
from folium.plugins import HeatMap

map_Orlando = folium.Map(location=Orlando_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_Orlando) #cartodbpositron cartodbdark_matter
HeatMap(dogrun_latlons).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
folium.Circle(Orlando_center, radius=1000, fill=False, color='white').add_to(map_Orlando)
folium.Circle(Orlando_center, radius=2000, fill=False, color='white').add_to(map_Orlando)
folium.Circle(Orlando_center, radius=3000, fill=False, color='white').add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando


Let's create another heatmap map showing **heatmap/density of Petstores** only.

In [27]:
###
map_Orlando = folium.Map(location=Orlando_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_Orlando) #cartodbpositron cartodbdark_matter
HeatMap(petstore_latlons).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
folium.Circle(Orlando_center, radius=1000, fill=False, color='white').add_to(map_Orlando)
folium.Circle(Orlando_center, radius=2000, fill=False, color='white').add_to(map_Orlando)
folium.Circle(Orlando_center, radius=3000, fill=False, color='white').add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando

Let's define new, more narrow region of interest.

In [28]:
roi_x_min = Orlando_center_x - 6000
roi_y_max = Orlando_center_y + 3000
roi_width = 15000
roi_height = 15000
roi_center_x = roi_x_min + 7500
roi_center_y = roi_y_max - 7500
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_Orlando = folium.Map(location=roi_center, zoom_start=14)
HeatMap(dogrun_latlons).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
folium.Circle(roi_center, radius=7500, color='white', fill=True, fill_opacity=0.4).add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando

  # Remove the CWD from sys.path while we load stuff.


This covers all the pockets of low dogrun density closest to the Orlando City Center.Let's also create new, more dense grid of prospective locations restricted to our new region of interest. We should make our prospective locations 1000m apart.

In [29]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 1000
y_step = 1000 * k 
roi_y_min = roi_center_y - 7500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'prospective neighborhood centers have been generated.')

20 prospective neighborhood centers have been generated.


  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.

OK. Now let's calculate two most important things for each location candidate: **number of dogruns in vicinity** (we'll use radius of **2500 meters**) and **distance to closest pet store**.

In [30]:
def count_dogruns_nearby(x, y, dogruns, radius=2500):    
    count = 0
    for dog in dogruns.values():
        dog_x = dog[7]; dog_y = dog[8]
        d = calc_xy_distance(x, y, dog_x, dog_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_dogrun(x, y, dogruns):
    d_min = 100000
    for dog in dogruns.values():
        dog_x = dog[7]; dog_y = dog[8]
        d = calc_xy_distance(x, y, dog_x, dog_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_dogrun_counts = []
roi_petstore_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_dogruns_nearby(x, y, dogruns, radius=2500)
    roi_dogrun_counts.append(count)
    distance = find_nearest_dogrun(x, y, petstores)
    roi_petstore_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [31]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Dogruns nearby':roi_dogrun_counts,
                                 'Distance to Petstore':roi_petstore_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Dogruns nearby,Distance to Petstore
0,28.560121,-81.351487,-8062147.0,11267260.0,0,100000
1,28.564446,-81.352609,-8061147.0,11267260.0,0,100000
2,28.554723,-81.35455,-8063197.0,11268130.0,0,100000
3,28.559047,-81.355672,-8062197.0,11268130.0,0,100000
4,28.563371,-81.356795,-8061197.0,11268130.0,0,100000
5,28.567696,-81.357918,-8060197.0,11268130.0,0,100000
6,28.554081,-81.358846,-8063147.0,11268990.0,0,100000
7,28.558404,-81.35997,-8062147.0,11268990.0,0,100000
8,28.562729,-81.361093,-8061147.0,11268990.0,0,100000
9,28.567053,-81.362217,-8060147.0,11268990.0,0,100000


We'll now **filter** the locations. Our interest is in **locations with no more than two dogruns in radius of 2500 meters**, and **no petstores in radius of 4000 meters**.

In [32]:
good_dog_count = np.array((df_roi_locations['Dogruns nearby']<=2))
print('Locations with no more than two dogruns nearby:', good_dog_count.sum())

good_pet_distance = np.array(df_roi_locations['Distance to Petstore']>=4000)
print('Locations with no petstores within 4000m:', good_pet_distance.sum())

good_locations = np.logical_and(good_dog_count, good_pet_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two dogruns nearby: 20
Locations with no petstores within 4000m: 20
Locations with both conditions met: 20


#### We should now see how this looks like on a map:

In [33]:
#Set the long. & Lat. for the 'Good' locations:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

# Now generate the Map:
map_Orlando = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_Orlando)
HeatMap(dogrun_latlons).add_to(map_Orlando)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando) 
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando


Good so far. We now have several locations that are fairly close to the city center, Orlando Centroplex. Any of those 20 locations is a potential candidate for a new petstore, at the very least based on the criteria of proximity to competition.

Next, we will show those **'good'** locations on a heatmap:

In [34]:
# Generate HeatMap of the 'good' locations:
map_Orlando  = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando 


We now have a clear indication of zones with low number of dogruns in their vicinity, and absolutely *no* petstores at all nearby.
Now we can **cluster** those locations to create **centers of zones containing good locations**. These zones, and their centers will be the final result of our analysis. 

In [35]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]


map_Orlando  = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_Orlando)
HeatMap(dogrun_latlons).add_to(map_Orlando)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_Orlando)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_Orlando) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando



  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.
  # Remove the CWD from sys.path while we load stuff.


The above clusters represent groupings of most of the prospective locations and cluster centers are placed nicely in the middle of the zones that are full of with prospective locations.
The locations of the centers of these clusters will be a good starting point for exploring the neighborhoods to find the best possible location based on the specifics of each neighborhood.

We can also see those zones on a city map using shaded areas to indicate our clusters instead of a heatmap:

In [36]:
map_Orlando  = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_Orlando) 
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando


Let's zoom in on the prospective areas along **Haven Drive**:

In [37]:
map_Orlando = folium.Map(location=[28.564434732414313, -81.36631128979374], zoom_start=15)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_Orlando) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando

Also along **Hampton Avenue**:

In [38]:
map_Orlando = folium.Map(location=[28.553263, -81.356272], zoom_start=15)
folium.Marker(Orlando_center).add_to(map_Orlando)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_Orlando) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_Orlando)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_Orlando)
folium.GeoJson(Orlando_neighJSON, style_function=neighbourhood_style, name='geojson').add_to(map_Orlando)
map_Orlando

Our analysis has now been completed. 15 different prospective location centers have been identified and they contain locations with low number of dog runs/parks and have no pet stores whatsoever nearby. All the zones being fairly close to city center (all of them being within 4km from Orlando Centroplex. About half of those less than 2km from Orlando Centroplex). 
Although the **green circle areas** are shown on the map, each having a radius of 500 meters, their actual shape is very irregular. Therefore, the centers should only be considered as a starting points for exploring the surrounding neighborhoods in search for potential Pet store business locations.

## Results and Discussion <a name="results"></a>

The analysis shows that although there is a great number of dog runs/parks in the Orlando, Florida area, there are pockets with low density of the same and considerably close to Orlando city center. Highest concentration of dog runs/parks was detected South East of Orlando City Center, and a lack of Pet stores within 20km radius of the city center.Therefore our attention was focused on proximity to the city center being an area that would be very desirable to set up a new business.
Directing our attention to this more narrow area of interest, we first created a dense grid of prospective locations; the locations were then filtered so that those with more than two dog runs/parks in radius of 2500m and those with a Pet Store closer than 4000m were removed.

These prospective locations were then clustered to create areas/zones of interest which contain greatest number of location candidates. A recommendation for further analysis would be  generating the addresses of the centers of those areas/zones of interest. This can be done using reverse geocoding to be used as indicators for starting points for more detailed and local analysis based on other factors depending on individual stakeholder preferences.

The final result of this project is the identification of 15 zones containing largest number of potential new business locations based on number of and distance to existing venues - both business establishments in general and Pet stores in particular. 
This project does not conclude or imply that the identified zones are the most optimal for a new Pet Store. However, the purpose of the concluded analysis was simply to provide information concerning areas that are close to the Orlando city center area, that are not overrun by existing business establishments, specifically Pet Stores. 
Therefore, the areas/zones that have been identified(or recommended) should only be considered starting points for more detailed analysis which should culminate in the selection of a location far from competition, strategically located to reach petential customers among other stakeholder-specific requirements.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify areas close to Orlando City Center with low number of stores, particularly Pet Stores. This is in order to help entrepreneurs/stakeholders to narrow down their search for the optimal locations for new Pet Stores. By calculating the density and distribution of dog runs/parks from Foursquare data, we have first identified general neighborhoods that justify further analysis based on proximity to customers. We then generated extensive collection of locations which satisfied a simple requirement regarding existing nearby petstores. 
Clustering of these locations was subsequently performed in order to identify and create the major zones of interest in the map. These clustered locations contained the greatest number of prospective locations.
The final decision concerning the optimal location for a new Pet store will be made by the concerned enterpreneur/stakeholder, based on the specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to dog runs/parks, etc.), pollution levels, and other socio-economic aspects of every neighborhood.