<center><h1>Applied Data Science - Capstone Project</h1>
    <h3>Identifying Ideal Neighborhoods for Fitness Centers</h3>
    <i>Phoenix, Arizona Metropolitan Area Research</i></center>


#### _Designed and Created By T.J. Griesenbrock_

---

Is there an ideal place for an investor to set up a fitness center?  Let's assume that people who eats in restaurants would want to work out the calories, so proximacy to restaurants are ideal.  In addition, we want the lowest ratio of fitness centers to restaurants (and businesses in general) to be considered as candidates.


## Table of Contents

* [Introduction](#introduction)
  * [Overview](#overview)
  * [Problem](#problem)
  * [Interest](#interest)
* [Data Source and Cleansing Methodology](#dscm)
  * [Data Source](#data)
  * [Initial Cleansing Methodology](#icm)
* [Actual Code Logic](#acl)
  * [Import Required Libraries](#irl)
  * [Global Variables](#gv)
  * [Create Functions](#cf)
    * [build_plots](#bp)
    * [find_new_location](#fnl)
    * [determine_distance](#dd)
    * [get_number_of_venues](#gnov)
    * [calculate_fitness_centers](#cfc)
    * [calculate_ratio](#cr)
    * [set_colors](#sc)
    * [build_new_record](#bnr)
  * [Main Process](#mp)
    * [Build Grid](#mp_bg)
    * [Convert to Pandas DataFrame](#mp_ctpdf)
  * [Current Methodology](#methodology)
  * [Analysis](#analysis)
    * [Build grid of Phoenix - Business to Fitness Centers](#mp_bgop_btofc)
    * [Build grid of Phoenix - Restaurants to Fitness Centers](#mp_bgop_rtofc)
    * [Heat Map of Phoenix - Business vs. Fitness Centers](#mp_hmop_bvfc)
    * [Heat Map of Phoenix - Restaurants vs. Fitness Centers](#mp_hmop_rvfc)
  * [Further Analysis](#further_analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [Commentary](#Commentary)

## Introduction<a name="introduction" />

### Overview<a name="overview" />

The [metro region of Phoenix, Arizona](https://en.wikipedia.org/wiki/Phoenix_metropolitan_area) is one of the fastest growing community in the United States of America.  Compromising of many communities all butting up against each other, Phoenix is an incredibly diverse metro area.  Since 2000, there was an increase of an estimated 1.5 million residents.  With the population growth, there is also a strong growth of businesses, including restaurants and breweries, all serving the needs of the population.

This is a live and active scenario, which present a number of opportunities for an investor looking for underserved markets that can be fed.  This research is focused on the needs of establishing fitness centers in the metro area.


### Problem<a name="problem" />

There is a massive boom of restaurants and breweries in the area, providing plenty of calories to our patrons that needs to be burned off.

Despite the excellent climate with sun present for over 300 days of the year, the metro area of Phoenix also has to deal with high temperatures for a significant portion of the year.   Conducting intensive exercise outdoor is ill-advised due to the high temperatures.  So there lies a need in the market for fitness centers to supply customers with a well-regulated environment to burn off desired calories and improve quality of life.

As a businessperson, we need to determine where to best locate a fitness center.  As a gut feeling, we should be locating a fitness center where there is no competition.  But we need to locate it near where our customers are.  We want to ensure that our fitness center is located in a region that does not have as many fitness centers relative to restaurants.  The theory goes that the more customers frequent an area to eat, the more they will frequent the same area to work out.  "Let's burn some calories we just ate" is a common refrain.

### Interest<a name="interest" />

The interested parties would be investors who wishes to establish a fitness center presence in the Phoenix metro area.


## Data Source and Cleansing Methodology<a name="dscm" />

### Data Source<a name="data" />

To achieve this objective, we need to identify locations with the number of restaurants, relative to number of fitness centers in the same area.  We do not know exactly what a fair evaluation scale is, so we will treat this in an iterative manner.

We will be using information from [Foursquare.com](https://foursquare.com/).  Foursquare is a premiere source of location data, yielding detailed information on the variety of regions around the world. We will be tracking the number of businesses in specific categories for the Phoenix metro area.  The data is live, and active.  There is no historical research done for this project.

According to [LatLong.net](https://www.latlong.net/place/phoenix-az-usa-18409.html), the center location for the city of Phoenix, Arizona is 33.448376 (latitude), -112.074036 (longitude).  We will be using this as our general starting point in our grid formation.

### Initial Cleansing Methodology<a name="icm" />

The logic is utilizing Python 3.6, on a local Anaconda installation.  To use elsewhere may require further modification of certain logics based on the Python version used and packages already included.

We will be building out a grid matrix in the Phoenix metro area.  The grid area will be built out in a 90-degree format - North/East/South/West direction from the central location.  We plan to plot a point every approximately 500 meters in every direction as long as there is are businesses in the area.  We do have to deal with the curvature of Earth, but we can ignore this for the most part, as we are dealing with a city located far from the poles.  The formula to calculate new location are as following:

    def find_location(longitude, latitude, direction, distance)
      earth = 6378.137 # kilometers.
      m = (1 / ((2 * pi / 360) * earth)) / 1000

      if direction = 'N' then
        new_latitude = latitude + (distance * m)
        new_longitude = longitude
      else if direction = 'S' then
        new_latitude = latitude + (-(distance) * m)
        new_longitude = longitude
      else if direction = 'W' then
        new_latitude = latitude
        new_longitude = longitude + (distance * m) / cos(latitude * (pi / 180));
      else if direction = 'E' then
        new_latitude = latitude
        new_longitude = longitude + (-(distance) * m) / cos(latitude * (pi / 180));
      else
        # Throw error.

To determine the distance between the two points, we can use the Haversine Formula.  The following code can be used:

    from math import radians, cos, sin, asin, sqrt
    def haversine(lon1, lat1, lon2, lat2):
      # convert decimal degrees to radians
      lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

      # haversine formula
      dlon = lon2 - lon1
      dlat = lat2 - lat1
      a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
      c = 2 * asin(sqrt(a))
      r = 6371 # Radius of earth in kilometers.
      return c * r

The following logic to build out the grid are as following:

    With a starting spot,
      - if no restaurant, abort.
      - Otherwise, spawn new search out in all 4 directions, spaced out 500 meters

    For every following location:
      If this location was already done, stop.
      Retain direction where it came from.
      Generate a number of businesses, restaurants, and fitness centers within the radius of 1 to 5 kilometers.
      If this location is more than 100 kilometers away from the starting point, stop.
      If this location has business,
        - Spawn new search out in 3 remaining directions.
        - Otherwise, stop.

    Done.

With the resulting values, we can calculate the ratio of fitness centers to restaurants, and businesses.  Then we should use a heat map overlaid with the actual map of Phoenix metro area to display the strength of each ratios.

We also should detail the top 10 areas where there is the lowest ratio of fitness centers to restaurants.  This kind of evaluation should consider whether there is a 'desert' of fitness center (where there is no fitness center regardless of the restaurants/business presence).  In that case, we should seek the highest collection of restaurants/businesses if there are such deserts of fitness centers.

Finally, we should evaluate further from the data gathered whether there are any other categories that may negatively impact the necessity of a fitness center.

This will be an iterative process, and the above will obviously be expanded in the final results.

## Actual Code Logic<a name="acl" />

The following is the actual code used for this investigation.

### Import Required Libraries<a name="irl" />

In [1]:
# Functions to handle data within dataframes.
import numpy as np

# Used to build and parse the dataframe.
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Help determine distance.
import geopy
from geopy.distance import great_circle
from geopy.distance import GreatCircleDistance

# HTML request
import requests as req

# Sys for error handling
import sys

# JSON to parse returned code.
import json

# map rendering library
import folium
from folium import plugins
from folium.plugins import HeatMap

# Random Numbers
from random import randint

# To save data locally:
import pickle

print('Libraries imported.')

Libraries imported.


### Global Variables<a name="gv" />

This section contains sensitive information that is masked for security purposes.

In [2]:
# global variables
step_distance = 5000 # meters.
scan_distance = 6000 # meters.
maximum_count = 50 # business to fetch.
max_distance = 100000 # meters. (100km)

# Master list of plot points with number of stores.
master_list = []
column_names = ['latitude', 
                'longitude', 
                'dist_from_center', 
                'num_business', 
                'num_restaurants', 
                'num_fitness',
                'business_to_fitness',
                'restaurants_to_fitness']

# Current central location for Phoenix, Arizona
start_latitude = 33.448376
start_longitude = -112.074036
start_location = geopy.Point(start_latitude, start_longitude)

zoom_start_value = 8

# Define directions' compass bearing
directions_bearing = {'N': 0, 
                      'E': 90, 
                      'S': 180, 
                      'W': 270}

fs_version = '20190617'

We need to define `fs_client_id` and `fs_client_secret` - both are assigned to each individual developer.  Thus, we need to mask the following cell:

In [3]:
fs_client_id = ENTER YOUR CODE #MASK
fs_client_secret = ENTER YOUR CODE #MASK

### Create Functions<a name="cf" />

The following functions are specifically created to handle tasks that are repeated several times.  Description of the functions are provided as needed.

#### build_plots<a name='bp' />

Build out a series of plots from the starting point to the ending point in a specified direction based on distance for each plot.

Starting point and ending points are added together, with all intermediate plot points and returned.

Note:  Solution on the floating point math is to determine if the last point is within the half distance of the end point.

In [4]:
def build_plots(bp_start, bp_end, bp_distance, bp_direction):
    list_plots = []
    
    step_distance = GreatCircleDistance(meters = bp_distance)
    next_step = bp_start
    
    while great_circle(next_step, bp_end).meters > (bp_distance / 2):
        list_plots += [next_step]
        next_step = step_distance.destination(point=next_step, bearing=bp_direction)

    list_plots += [bp_end]
    return(list_plots)

#### find_new_location<a name="fnl" />

Obtain the latitude and longitude of a new location based on the direction from the original location.  The result is returned as a `geopy.Point`.

In [5]:
def find_new_location(fnl_location, fnl_direction, fnl_distance):
    step_location = GreatCircleDistance(meters = fnl_distance)
    return step_location.destination(point=fnl_location, bearing=directions_bearing[fnl_direction])

#### determine_distance<a name="dd" />

Use geopy's `great_circle` to determine distance.  The result is returned as meters.

In [6]:
def determine_distance(dd_begin, dd_end): 
    return great_circle(dd_begin, dd_end).meters

#### get_number_of_venues<a name="gnov" />

This is used to get total number of businesses and restaurants in a specific radius.

Global variables:  `fs_client_id`, `fs_client_secret`, `fs_version`.

In [7]:
def get_number_of_venues(gnov_location, gnov_radius, gnov_section=None):
    url = ('https://api.foursquare.com/v2/venues/explore?' + 
           'client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
               fs_client_id, 
               fs_client_secret, 
               fs_version, 
               str(gnov_location.latitude), 
               str(gnov_location.longitude),
               gnov_radius) +
           '&limit=1&sortByDistance=1&offset=0')
        
    if gnov_section != None:
        url += '&section={}'.format(gnov_section)
    
    try:
        venue_count = req.get(url).json()['response']['totalResults']
    except:
        venue_count = -1
    
    #venue_count = randint(0, 50)

    return venue_count

#### calculate_fitness_centers<a name="cfc" />

Unlike the above function, `get_number_of_venues`, fitness centers search are focused, and may only return the maximum number (50 as defined in `maximum_count`).

global variables: `fs_client_id`, `fs_client_secret`, `fs_version`, `maximum_count`.

In [8]:
def calculate_fitness_centers(cfc_location, cfc_radius):
    url = 'https://api.foursquare.com/v2/venues/search'

    params = dict(
        client_id=fs_client_id,
        client_secret=fs_client_secret,
        v=fs_version,
        ll=str(cfc_location.latitude) + ',' + str(cfc_location.longitude),
        radius=cfc_radius,
        intent='browse',
        categoryId='4bf58dd8d48988d175941735',
        limit=maximum_count
    )
    
    try:
        venue_count = len(req.get(url=url, params=params).json()['response']['venues'])
    except:
        venue_count = -1
    
    #venue_count = randint(0, 10)
    
    return venue_count

#### calculate_ratio<a name="cr" />

Set up ratios based on internal rules.

In [9]:
def calculate_ratio(num, den):
    value = None
    
    if num == 0:
        value = 0.0
    elif den == 0:
        value = num
    else:
        value = num / den

    return value

#### set_colors<a name="sc" />

Set up color based on ranges.

In [10]:
def set_colors(value):
    temp_color = None
    
    # Let's set some colors.
    if value == 0.0:
        temp_color = 'black'
    elif value <= 1.0:
        temp_color = 'red'
    elif value <= 5.0:
        temp_color = 'lightred'
    elif value <= 10.0:
        temp_color = 'orange'
    elif value <= 50.0:
        temp_color = 'green'
    else:
        temp_color = 'blue'

    return temp_color

#### build_new_record<a name="bnr" />

Once we plot a new location, we need to pull from FourSquare details on this record, and then spawn a new series of plot points in directions we haven't been to until we reach our limits.

Global variables:  `scan_distance`, `precision_point`, `degree_per_meter`.

In [11]:
def build_new_record(center_location, max_distance, step_distance):
    #NOT SURE IF NEEDED - global scan_distance, precision_point, degree_per_meter
    counter = 0
    master_list = []
    
    south_location = find_new_location(center_location, 'S', max_distance)
    north_location = find_new_location(center_location, 'N', max_distance)
    lat_distance = determine_distance(south_location, north_location)
    
    steps = lat_distance / step_distance
    
    # Build plots and return as list.
    plot_latitude_steps = build_plots(south_location, north_location, step_distance, directions_bearing['N'])
        
    for plot_latitude in plot_latitude_steps:
        step_location = geopy.Point(plot_latitude.latitude, start_longitude)
        west_location = find_new_location(step_location, 'W', max_distance)
        east_location = find_new_location(step_location, 'E', max_distance)
        long_distance = determine_distance(west_location, east_location)
        long_step = long_distance / steps

        plot_longitude_steps = build_plots(west_location, east_location, long_step, directions_bearing['E'])
        
        for plot_location in plot_longitude_steps:
            plot_distance = determine_distance(plot_location, center_location)
            if max_distance > plot_distance:
                plot_business = get_number_of_venues(plot_location, scan_distance)
                if plot_business > 0:
                    plot_restaurants = get_number_of_venues(plot_location, scan_distance, 'food')
                    plot_fitness_center = calculate_fitness_centers(plot_location, scan_distance)
                    
                    # Let's do some calculation.
                    plot_BtoFC = calculate_ratio(plot_business, plot_fitness_center)
                    plot_RtoFC = calculate_ratio(plot_restaurants, plot_fitness_center)

                    # Add to list.
                    master_list += [[plot_location.latitude,
                                     plot_location.longitude,
                                     plot_distance,
                                     plot_business, 
                                     plot_restaurants, 
                                     plot_fitness_center, 
                                     plot_BtoFC, 
                                     plot_RtoFC]]
                    
                    if counter >= 10:
                        counter = 0
                        print('.', end='')
                    else:
                        counter += 1
    
    return master_list

### Main Process<a name="mp" />

The following block will call the above functions as needed, and provide detailed results as desired.

#### Build Grid<a name="mp_bg" />

Let's build out the grid - first load it if we already have a local copy.  Otherwise, pull from Foursquare.  To regenerate, we need to make sure there's no local copy of `master_list_teej_capstone.pkl`.

In [12]:
try:
    with open('master_list_teej_capstone.pkl', 'rb') as cur_file:
        master_list = pickle.load(cur_file)
except:    
    print("Getting data.", end="")
    
    master_list = build_new_record(start_location, max_distance, step_distance)
    
    print("\nRegenerated data.  Saving...")
    
    # Save this.
    with open('master_list_teej_capstone.pkl', 'wb') as cur_file:
        pickle.dump(master_list, cur_file)
    
    print("Done.")

#### Convert to Pandas DataFrame<a name="mp_ctpdf" />

Also show the first few records and size of dataframe to confirm this buildout is correct.

In [13]:
df_master_list = pd.DataFrame.from_records(master_list, columns=column_names)

df_master_list.head()

Unnamed: 0,latitude,longitude,dist_from_center,num_business,num_restaurants,num_fitness,business_to_fitness,restaurants_to_fitness
0,32.58934,-112.340899,98707.814288,1,1,0,1.0,1.0
1,32.589227,-111.807212,98719.045818,1,0,1,1.0,0.0
2,32.679109,-111.646676,94354.943567,1,1,0,1.0,1.0
3,32.679097,-111.593254,96562.756118,1,1,0,1.0,1.0
4,32.679086,-111.539832,98971.926627,5,4,0,5.0,4.0


## Current Methodology<a name="methodology" />

Previously, I disclosed that we would be trying to be as accurate as possible in term of grid locations based on the curvature of the Earth, so while on a latitude axis, it would be spaced evenly, but on a longitude axis, there are mild deviation, particularly for the city of Phoenix.  This method is excellent if you wish to reuse this method against any city in the world.

Unfortunately, trying to home brew a formula emulating the Haversine Formula led to a conversion problem that took me a significant chunk of time to resolve.  A resolution is found by using Geopy’s Great Circle functionality.  Thus a few functions were eliminated, and a couple others vastly simplified.  

The second problem is with the Foursquare usage policy.  My research is extremely large – a grid of 100 kilometers from the center would result in 125,655 grid points if we are separating the results every 500 meters.  Just increasing the range to 1 kilometer reduces the number of grid points to 31,419.  This still bumps against Foursquare usage policy where only 50,000 queries can be done a day.  In the worst-case scenario, we would be querying 94,000 times – far beyond what is capable.  The solution is to reduce the scale to 5-kilometer range, and the circle of search area should overlap at around 1.25% to ensure we are not missing stores at the edges.  The resulting queries number drops to 1,253 plot points in the worst-case scenario.

The third problem is also related to the Foursquare usage policy.  It is very easy to fetch a number of restaurants and total businesses but querying for fitness centers require us to do a search.  In the developer license version, we are also not able to query multiple pages, perhaps as an effort by the company to prevent excessively scraping of data without being properly compensated.  So, the maximum number of fitness centers in an area searched will be 50 places.  This is important especially in area where there are multitudes of hotels and businesses – a lot of workplaces have their own fitness centers that may not be accessible to the public, but is still listed in Foursquare, a reflection of the completeness of their data.  Our research capacity in this area is limited.  However, we are focused on scarcity, not abundances of fitness centers relative to businesses and restaurants.

We want to not only use color codes to list number of businesses and restaurants relative to fitness centers, but also to highlight locations where there are no fitness centers, as a potential areas of focus, especially if it matches up with a high number of businesses/restaurants.  Let’s get started.

## Analysis<a name="analysis" />

#### Build grid of Phoenix - Business to Fitness Centers<a name="a_bgop_btofc" />

Let's build out a list of plot points around Phoenix, with colors set using the ratio of business to fitness centers.

In [14]:
# create map of Phoenix using latitude and longitude values
map_phoenix_bus = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

# add markers to map
for lat, lng, BtoFC in zip(df_master_list['latitude'], 
                           df_master_list['longitude'],
                           df_master_list['business_to_fitness']):
    label = '{}'.format(BtoFC)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=set_colors(BtoFC),
        fill=True,
        fill_color=set_colors(BtoFC),
        fill_opacity=0.7,
        parse_html=False).add_to(map_phoenix_bus)  
    
map_phoenix_bus

#### Build grid of Phoenix - Restaurants to Fitness Centers<a name="a_bgop_rtofc" />

Let's build out a list of plot points around Phoenix, with colors set using the ratio of restaurants to fitness centers.

In [15]:
# create map of Phoenix using latitude and longitude values
map_phoenix_rest = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

# add markers to map
for lat, lng, RtoFC in zip(df_master_list['latitude'], 
                           df_master_list['longitude'],
                           df_master_list['restaurants_to_fitness']):
    label = '{}'.format(RtoFC)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=set_colors(RtoFC),
        fill=True,
        fill_color=set_colors(RtoFC),
        fill_opacity=0.7,
        parse_html=False).add_to(map_phoenix_rest)  
    
map_phoenix_rest

#### Heat Map of Phoenix - Business vs. Fitness Centers<a name="a_hmop_bvfc" />

Let's filter the records and only show areas that DO NOT have fitness centers, but do have businesses.

In [16]:
df_filtered_list = df_master_list[df_master_list['num_fitness'] == 0]

location_list_bus = [[row['latitude'],row['longitude']] for index, row in df_filtered_list.iterrows()]

# create map of Phoenix using latitude and longitude values
map_phoenix_heat_bus = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

HeatMap(location_list_bus).add_to(map_phoenix_heat_bus)

map_phoenix_heat_bus

#### Heat Map of Phoenix - Restaurants vs. Fitness Centers<a name="a_hmop_rvfc" />

Let's filter the records and only show areas that DO NOT have fitness centers, but do have restaurants.

In [17]:
df_filtered_list = df_master_list[(df_master_list['num_restaurants'] != 0) & (df_master_list['num_fitness'] == 0)]

location_list_res = [[row['latitude'],row['longitude']] for index, row in df_filtered_list.iterrows()]

# create map of Phoenix using latitude and longitude values
map_phoenix_heat_res = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

HeatMap(location_list_res).add_to(map_phoenix_heat_res)

map_phoenix_heat_res

### Further Analysis<a name="further_analysis" />

The initial results provided some interesting insight -- apparently, Phoenix is well covered for fitness centers in well-populated areas.  The Phoenix metro area are surrounded by four Indian reservations (Gila River to the south for example).  They are all sparsely populated, but still have certain businesses that serve the community.  Highways and other roads also have businesses, but obviously not fitness centers (people on the road need fuel for their vehicles and body, not muscles).

Let's try this again -- let's look for areas with restaurants, and ONLY ONE fitness center nearby.  The hypothesis follows that a monopoly fitness center could be disrupted by a competitor.

In [18]:
df_filtered_list = df_master_list[(df_master_list['num_restaurants'] != 0) & (df_master_list['num_fitness'] == 1)]

location_list_res = [[row['latitude'],row['longitude']] for index, row in df_filtered_list.iterrows()]

# create map of Phoenix using latitude and longitude values
map_phoenix_heat_res = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

HeatMap(location_list_res).add_to(map_phoenix_heat_res)

map_phoenix_heat_res

Amazingly enough, there are still sparsely populated areas that have fitness centers!

Hmm, let's try a different tack.  How about places that have a very high ratio of restaurants to fitness centers?  That should do the trick!

In [19]:
df_filtered_list = df_master_list[(df_master_list['restaurants_to_fitness'] > 20)]

location_list_res = [[row['latitude'],row['longitude']] for index, row in df_filtered_list.iterrows()]

# create map of Phoenix using latitude and longitude values
map_phoenix_heat_res = folium.Map(location=[start_latitude, start_longitude], zoom_start=zoom_start_value)

HeatMap(location_list_res).add_to(map_phoenix_heat_res)

map_phoenix_heat_res

Note:  This was done iteratively.  No results were returned at greater-than-50 ratio.  Nothing at greater-than-40 ratio!  Nor greater-than-30!

At greater-than-20 ratio, we finally got one hit!  This is in the area of south Buckeye in the extreme west region of Phoenix metropolitian area.

Further investigation in that particular area should commence to determine if it is a viable candiate for establishing a fitness center.

## Results and Discussion<a name="results">


We set out with a hypothesis that people who eats out will need to have a local fitness center to burn the fuel.

We discovered that businesses and restaurants are closely related to the distribution of fitness centers in the vast majority of the valley.  The area that have no fitness centers are so sparsely populated (not detailed here) that there is most likely minimal revenue that could be generated by locating fitness centers in those areas.  One possible alternative is to have a "nature-based" fitness centers, but those are already most likely being served, as the highlighted area with only one fitness center being located in sparsely populated areas.

By focusing on the ratios, we discovered that there is an area in west Phoenix metro area that appears to be underserved.  However, my past experience in this area is that Buckeye is a rapidly growing region, with many new planned communities being developed in the area.

Further investigation would require a much more granular investigation model.  This would be consider as the Starbucks growth model - where the belief is that customers do not have the energy to cross the street to work out.  Fortunately, the logic I built above would provide sufficient tools to continue to iterate.  Unfortunately, the license limitation of Foursquare impacts this investigation effort.  Doing so would be a multi-day process in a segmented manner.  This is way beyond the scope of this lab.

## Conclusion<a name="conclusion">


#### _We are the Buckeyes, rah rah rah!_

Based on the current data provided, the best place to further investigate the value to establish a brand new fitness center is in the southern region of the city of Buckeye.  

## Commentary<a name="Commentary" />

This project was estimated to be a 30-hour project.  I have spent over 100 hours on this, due to the number of roadblocks in working on this project.

Initially, I was focused on what was mostly proposed by the author of this class – to determine the best place for a person to move in Toronto.  So I tried to map it against what I would like to have satisfied if my wife and I move to Toronto – with multitudes of conditions (Not near an airport, close to fitness centers, and so on forth).  However, there were one problem with my analysis:  

I bit off way too much than I am able to chew based on my expertise with Python at this time.  Mastery of Python is different than mastery of other programming languages.  I found myself falling back to the old standby language idioms and ends up tripping over myself in trying to get this to work in a Python environment.  In addition, my data conditions were way too complex to come up with a solid analysis plan.  I would need hundreds of hours of studying, and improvement in skills in working with this new feature, before I could come up with a solid working plan.

I pivoted, and fell back to my knowledge base – my 13 years of living in Phoenix, Arizona.  I know the neighborhood, and I know the needs of the population there – fitness centers.

Then come the second part of my issue:  I was going to home brew everything from scratch, in calculating the distance for each plot points, taking care to go beyond what was provided as an example in the Battle of the Neighborhood in Berlin.  But I keep tripping on a number of invalid assumptions that reflect my limited knowledge on handling latitude and longitude, along with many bugs in my codebase based on my incorrect assumptions on how global variables are handled in Python, among other things.

My initial plan actually slammed against FourSquare's daily usage limits quite too often.  Again, I bit off more than I could chew.

I learned so much from this, but I had only a few days before I must submit this.  Despite the fact that I did the first 4 weeks within a matter of days, I ended up taking almost 4 weeks to do this project.

I do hope this is sufficient to prove that I am on my way to become a valid Data Scientist, with strong growing knowledge of Python and its Data Science tools.  

Thank you.