# Location Recommender for Maggie & Maggie's Beach Library and Bar #
__Developed by:__ Jono Pike 
<br>
__Development environment:__ Python 3.6 in IBM Watson Studio
<br>
__Hosted on:__ GitHub, https://github.com/Soltis12/

## Table of Contents
[1. Background](#1_Background)<br>
* [1a. Project Requirements](#1a_ProjectRequirements)<br>

[2. Prepare the development environment](#2_Prepare_DE)<br>
* [2a. Import the required packages and libraries](#2a_Import_PL)<br>
* [2b. Import the US Cities dataset](#2b_Import_USCities)<br>
* [2c.Set variables based on stakeholder requirements](#2c_Stakeholder)<br>

[3. Clean the Data: Coastal Locations](#3_Clean_Data)<br>
* [3a. Refine the data, excluding locations of no use to stakeholders](#3a_Refine)<br>
* [3b. Link the refined cities data to the Foursuqare data](#3b_Foursquare)<br>
* [3c. Plot the current list onto a map to give summary thus far](#3c_Plot1)<br>
* [3d. Filter the dataset to coastal locations only](#3d_Filter_Coastal)<br>

[4. Enhance the Data: Local Area Populations](#4_Enhance_Data)<br>
* [4a. Determine local population catchment area based on lat / lon](#4a_Refine)<br>
* [4b. Run a function to get the sum of local populations for each location](#4b_localpops)<br>

[5. Enhance the Data: Other Local Businesses](#5_Enhance_Local)<br>
* [5a. Determine local businesses nearby and whether they would have positive / negative impact](#5a_Refine)<br>
* [5b. Generate a local business score based on good minus bad](#5b_BusScore)<br>

[6. Final Recommendation: Top 5](#6_Final_Rec)<br>

---

### 1. Background<a name="1_Background"></a> ###

This is a final project for a Data Science course on Coursera, to demonstrate my understanding and ability to apply skills learned.
<br>
<br>
Two friends of mine, both named Maggie, want to open a business located on a beach. They want to create a Beach Library and Bar, where customers can relax on the beach with a drink whilst renting a book.
<br>
<br>
The business will be hence be referred to as '_Maggie & Maggie's_'.

#### 1a. Project Requirements<a name="1a_ProjectRequirements"></a> ####

Requirements from __Coursera__ are:
 -  The development environment must be Python
 -  Tool developed must use Foursquare data

Requirements from __the Stakeholders__ are:
 -  The location must be a beach, in the contiguous USA
 -  US States the stakeholders will not consider are:
     -  California
     -  Nebraska
     -  Montana
     -  Alabama
     -  Iowa
     -  Kansas
 -  US States the stakeholders favour are:
     -  Florida
     -  North Carolina


---

### 2. Prepare the development environment<a name="2_Prepare_DE"></a> ###

#### 2a. Import the required packages and libraries<a name="2a_Import_PL"></a>  ####

In [1]:
# Import the necessary libraries required for the tool
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# map rendering library
!conda install -c conda-forge folium=0.5.0 --yes
import folium 

# read DataFrame to String
from ast import literal_eval

# count occurences in a dictionary
from collections import Counter

# perform calculcations on latitude / longitude
import math

print ('Packages and libraries imported')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Packages and libraries imported


#### 2b. Import and prepare the US Cities dataset<a name="2b_Import_USCities"></a>  ####
Source of this data: https://simplemaps.com/data/us-cities

In [2]:
# Load the US Cities data from GitHub repository
usa_cities_raw = pd.read_csv('https://raw.githubusercontent.com/Soltis12/coursera_capstone/master/09_usa_cities.csv')
usa_cities_raw.head()

Unnamed: 0,city,city_ascii,state_id,state_name,county_fips,county_name,lat,lng,population,population_proper,density,source,incorporated,timezone,zips,id
0,Prairie Ridge,Prairie Ridge,WA,Washington,53053,Pierce,47.1443,-122.1408,,,1349.8,polygon,False,America/Los_Angeles,98360 98391,1840037882
1,Edison,Edison,WA,Washington,53057,Skagit,48.5602,-122.4311,,,127.4,polygon,False,America/Los_Angeles,98232,1840017314
2,Packwood,Packwood,WA,Washington,53041,Lewis,46.6085,-121.6702,,,213.9,polygon,False,America/Los_Angeles,98361,1840025265
3,Wautauga Beach,Wautauga Beach,WA,Washington,53035,Kitsap,47.5862,-122.5482,,,261.7,point,False,America/Los_Angeles,98366,1840037725
4,Harper,Harper,WA,Washington,53035,Kitsap,47.5207,-122.5196,,,342.1,point,False,America/Los_Angeles,98366,1840037659


#### 2c.Set variables based on stakeholder requirements<a name="2c_Stakeholder"></a>  ####

In [3]:
# Variables that relate to this dataset from stakeholder requirements:

# List of excluded states provided by stakeholders, matching dataset entry above
excluded_states = ['California','Nebraska','Alabama','Iowa','Kansas','Montana']
preferred_states = ['Florida','North Carolina']

# Minimum population for the city to be considered
min_population = 100

# Maximum distance for beach to be from city location in metres
max_distance = 2000

# Local area population distance in miles
local_miles = 10

# Minimum population for the local area
local_min_pop = 100000

# Location requested
wanted_location = 'beach'

# Advantageous nearby businesses: tourist stuff
business_good = ['Bar','Gift','Shop','Tour','Ice Cream','Snack','Food','Restaurant','Surf','Sand','Beach','Sea','Ocean']

# Disadvantageous nearby businesses: child-centric stuff
business_bad = ['Kid','Child','Play','School','Kindergarten','Nursery','Mother']

# Distance of businesses
business_distance = 1000

print('Variables defined')

Variables defined


### 3. Clean the Data: Coastal Locations <a name="3_Clean_Data"></a> ###

#### 3a. Refine the data, excluding locations of no use to stakeholders <a name="3a_Refine"></a> ####

In [4]:
# Refine the data. Only include fields that are relevant.
usa_cities = usa_cities_raw[['city','state_name','lat','lng','population']]
usa_cities.head()

Unnamed: 0,city,state_name,lat,lng,population
0,Prairie Ridge,Washington,47.1443,-122.1408,
1,Edison,Washington,48.5602,-122.4311,
2,Packwood,Washington,46.6085,-121.6702,
3,Wautauga Beach,Washington,47.5862,-122.5482,
4,Harper,Washington,47.5207,-122.5196,


In [5]:
# Exclude any rows with a null, as all fields require populated data.
usa_cities = usa_cities.dropna()
usa_cities.head()

Unnamed: 0,city,state_name,lat,lng,population
6,Kahlotus,Washington,46.6436,-118.5566,189.0
8,Washtucna,Washington,46.7539,-118.3104,195.0
10,Toledo,Washington,46.4412,-122.8494,738.0
12,Renton,Washington,47.4757,-122.1904,100953.0
13,Chehalis,Washington,46.6649,-122.966,7498.0


In [6]:
# Get the distinct state names from the dataset, so the list of excluded states can exactly match the names in the dataset.
distinct_states = usa_cities['state_name'].unique()
distinct_states

array(['Washington', 'Virginia', 'Delaware', 'District of Columbia',
       'Wisconsin', 'West Virginia', 'Hawaii', 'Florida', 'Wyoming',
       'New Hampshire', 'New Jersey', 'New Mexico', 'Texas', 'Louisiana',
       'North Carolina', 'North Dakota', 'Nebraska', 'Tennessee',
       'New York', 'Pennsylvania', 'California', 'Nevada', 'Colorado',
       'Alaska', 'Alabama', 'Arkansas', 'Vermont', 'Illinois', 'Georgia',
       'Indiana', 'Iowa', 'Oklahoma', 'Arizona', 'Idaho', 'Connecticut',
       'Maine', 'Maryland', 'Massachusetts', 'Ohio', 'Utah', 'Missouri',
       'Minnesota', 'Michigan', 'Rhode Island', 'Kansas', 'Montana',
       'Mississippi', 'South Carolina', 'Kentucky', 'Oregon',
       'South Dakota'], dtype=object)

In [7]:
# Refine the dataset to remove the excluded states
usa_cities = usa_cities[~usa_cities.state_name.isin(excluded_states)]

# Check the output to ensure the excluded states have been removed from the dataframe
distinct_states_2 = usa_cities['state_name'].unique()
distinct_states_2

array(['Washington', 'Virginia', 'Delaware', 'District of Columbia',
       'Wisconsin', 'West Virginia', 'Hawaii', 'Florida', 'Wyoming',
       'New Hampshire', 'New Jersey', 'New Mexico', 'Texas', 'Louisiana',
       'North Carolina', 'North Dakota', 'Tennessee', 'New York',
       'Pennsylvania', 'Nevada', 'Colorado', 'Alaska', 'Arkansas',
       'Vermont', 'Illinois', 'Georgia', 'Indiana', 'Oklahoma', 'Arizona',
       'Idaho', 'Connecticut', 'Maine', 'Maryland', 'Massachusetts',
       'Ohio', 'Utah', 'Missouri', 'Minnesota', 'Michigan',
       'Rhode Island', 'Mississippi', 'South Carolina', 'Kentucky',
       'Oregon', 'South Dakota'], dtype=object)

In [8]:
# Refine the dataset to only include preferred states
usa_cities = usa_cities[usa_cities['state_name'].isin(preferred_states)]
print(usa_cities.head())
usa_cities.shape

              city state_name      lat      lng  population
3322       Alachua    Florida  29.7779 -82.4827      4493.0
3325      Bushnell    Florida  28.6852 -82.1166      4439.0
3326  Apalachicola    Florida  29.7282 -84.9940      3850.0
3328      Tequesta    Florida  26.9619 -80.1012      5990.0
3329        Dundee    Florida  28.0123 -81.6004      4209.0


(956, 5)

In [9]:
# Refine the dataset to remove any cities too small to be considered
usa_cities = usa_cities[usa_cities['population'] > min_population]
usa_cities = usa_cities.reset_index()
usa_cities = usa_cities.drop(['index'], axis = 1) # Drop the old index
usa_cities['city_id'] = usa_cities.index
usa_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,city_id
0,Alachua,Florida,29.7779,-82.4827,4493.0,0
1,Bushnell,Florida,28.6852,-82.1166,4439.0,1
2,Apalachicola,Florida,29.7282,-84.994,3850.0,2
3,Tequesta,Florida,26.9619,-80.1012,5990.0,3
4,Dundee,Florida,28.0123,-81.6004,4209.0,4


In [10]:
# Identify number of cities at this stage, with a population over threshold, not in excluded states.
# Further refinements to go: cities near a shoreline.
usa_cities.shape

(938, 6)

#### 3b. Link the refined cities data to the Foursuqare data <a name="3b_Foursquare"></a> ####
This will identify locations in the cities dataset that have a beach

In [11]:
# The code was removed by Watson Studio for sharing.

In [12]:
# First, test the code on a single city that I know has a beach
# Get the index number of Miama, Florida
miami_df = usa_cities[(usa_cities['city'] == 'Miami') & (usa_cities['state_name'] == 'Florida')]
miami_df

Unnamed: 0,city,state_name,lat,lng,population,city_id
117,Miami,Florida,25.784,-80.2102,6247425.0,117


In [13]:
# Get the index number of Miami, Florida for use as a variable to pass to get location data
city_sample = int(usa_cities[(usa_cities['city'] == 'Miami') & (usa_cities['state_name'] == 'Florida')].index[0])
city_sample

117

In [14]:
# Create a blank dataframe where city and has beach flag can be appended
city_has_beach = pd.DataFrame(columns=['beachnum','city','state_name',])
city_has_beach

Unnamed: 0,beachnum,city,state_name


In [15]:
# First, test the code on a single city that I know has a beach
# Get a neighborhood

city_latitude = usa_cities.loc[city_sample, 'lat'] # neighborhood latitude value
city_longitude = usa_cities.loc[city_sample, 'lng'] # neighborhood longitude value
city_state = usa_cities.loc[city_sample, 'state_name'] # state name
city_name = usa_cities.loc[city_sample, 'city'] # city name

print('Latitude and longitude values of {}, {} are {}, {}.'.format(city_name,
                                                               city_state, 
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Miami, Florida are 25.784, -80.2102.


In [16]:
# Build the Foursquare call
latitude = city_latitude
longitude = city_longitude
radius = max_distance
LIMIT = 5
cat_id = '4bf58dd8d48988d1e2941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, cat_id, LIMIT)

# Call the data
results = requests.get(url).json()
results_string = json.dumps(results)
results_string

'{"meta": {"code": 200, "requestId": "5d7d1a0f787dba0030b83f58"}, "response": {"venues": [{"id": "53c4b779498e5269b662d16a", "name": "Miami Beach", "location": {"lat": 25.78909864412232, "lng": -80.2217425770895, "labeledLatLngs": [{"label": "display", "lat": 25.78909864412232, "lng": -80.2217425770895}], "distance": 1288, "cc": "US", "city": "Miami", "state": "FL", "country": "United States", "formattedAddress": ["Miami, FL", "United States"]}, "categories": [{"id": "4bf58dd8d48988d1e2941735", "name": "Beach", "pluralName": "Beaches", "shortName": "Beach", "icon": {"prefix": "https://ss3.4sqi.net/img/categories_v2/parks_outdoors/beach_", "suffix": ".png"}, "primary": true}], "referralId": "v-1568479759", "hasPerk": false}, {"id": "4d90a29b788c54811d9668fd", "name": "79th & Collins", "location": {"lat": 25.776462664795087, "lng": -80.18767270719864, "labeledLatLngs": [{"label": "display", "lat": 25.776462664795087, "lng": -80.18767270719864}], "distance": 2408, "cc": "US", "city": "Mia

In [17]:
# Search the Foursquare string for the word 'Beach'
beaches_count = results_string.count('"shortName": "Beach"')
beaches_count

2

In [18]:
# Pass the row elements as key value pairs to append() function 
city_has_beach = city_has_beach.append({'city' : city_name , 'state_name' : city_state, 'beachnum': beaches_count} , ignore_index=True)
city_has_beach

Unnamed: 0,beachnum,city,state_name
0,2,Miami,Florida


In [19]:
# Create a function to iterate through all cities in the usa_cities DataFrame to identify count of beaches
def find_a_beach(the_dataframe,
                 test_run = 1): # Better testing environment by giving option to limit the number of iterations
    
    # Create a blank dataframe where city and has beach flag can be appended.
    city_has_beach = pd.DataFrame(columns=['beachnum','city','state_name'])
    
    # Identify the number of required iterations: maximum index number.
    cities = the_dataframe['city_id'].max()
    
    # Number of iterations limited to two if test run is true.
    if test_run == 1:
        totalruns = 10
    else:
        totalruns = cities

    # Loop for each city in the DataFrame.
    for i in range(totalruns):
        
        # Define the variables for the Foursquare call
        city_latitude = the_dataframe.loc[i, 'lat'] # neighborhood latitude value
        city_longitude = the_dataframe.loc[i, 'lng'] # neighborhood longitude value
        city_state = the_dataframe.loc[i, 'state_name'] # state name
        city_name = the_dataframe.loc[i, 'city'] # city name
        
        # Build the Foursquare call
        latitude = city_latitude
        longitude = city_longitude
        radius = max_distance
        LIMIT = 5
        cat_id = '4bf58dd8d48988d1e2941735'
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, cat_id, LIMIT)
        
        # Call the data
        results = requests.get(url).json()
        results_string = json.dumps(results)
        
        # Search the Foursquare string for the word 'Beach'
        beaches_count = results_string.count('"shortName": "Beach"')
        
        # Pass the row elements as key value pairs to append() function 
        city_has_beach = city_has_beach.append({'city' : city_name , 'state_name' : city_state, 'beachnum': beaches_count} , ignore_index=True)
    
    # Present the dataframe
    global output
    output = city_has_beach[city_has_beach['beachnum'] > 0]
    output = output.reset_index() # Re-index
    output = output.drop(['index'], axis = 1) # Drop the old index
    output['index_col'] = output.index
    result_count = (output['index_col'].max()+1)
    
    print('{} cities were returned from your search'.format(result_count))
    
    return output

print('Function created')

Function created


In [20]:
# Run the function
cities_with_beaches = find_a_beach(usa_cities,0)
cities_with_beaches

224 cities were returned from your search


Unnamed: 0,beachnum,city,state_name,index_col
0,4,Tequesta,Florida,0
1,4,North Palm Beach,Florida,1
2,5,Indian Rocks Beach,Florida,2
3,4,Belleair Beach,Florida,3
4,3,Miami Springs,Florida,4
5,1,Bradenton,Florida,5
6,5,Briny Breezes,Florida,6
7,1,Miami Lakes,Florida,7
8,2,Estero,Florida,8
9,1,Lake Placid,Florida,9


In [21]:
# Inner Join the cities with beaches to the usa cities dataset
output_cities = pd.merge(cities_with_beaches, usa_cities, on=['city','state_name'])
output_cities = output_cities.drop(['beachnum','index_col'], axis = 1) # Drop redundant fields
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,city_id
0,Tequesta,Florida,26.9619,-80.1012,5990.0,3
1,North Palm Beach,Florida,26.8217,-80.0574,12993.0,7
2,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,9
3,Belleair Beach,Florida,27.924,-82.8365,1623.0,10
4,Miami Springs,Florida,25.8195,-80.2896,14431.0,12


In [22]:
# Show shape of the dataset, to show how many cities at this stage of the analysis
print(output_cities.shape)

(224, 6)


#### 3c. Plot the current list onto a map to give summary thus far <a name="3c_Plot1"></a> ####

In [23]:
# create map of the SouthEastern USA using latitude and longitude values
map_usa_1 = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, state_name, city in zip(output_cities['lat'], output_cities['lng'], output_cities['state_name'], output_cities['city']):
    label = '{}, {}'.format(city, state_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa_1)  
    
map_usa_1

#### 3d. Filter the dataset to coastal locations only <a name="3d_Filter_Coastal"></a> ####

In [24]:
# Manual sampling at this stage shows prospective coastal locations, which are favoured by the stakeholders, contain words such as 'beach', 'coast' etc.
# Therefore, put these into a list to further filter the dataset.
string_match = ['Beach','Coast','Shore','Sand','Sands']

# Filter the dataset again to display coastal locations
output_cities = output_cities[output_cities['city'].str.contains('|'.join(string_match), na = False)]
output_cities.shape

(64, 6)

In [25]:
# Refresh the map, which now shows only coastal locations
map_usa_2 = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, state_name, city in zip(output_cities['lat'], output_cities['lng'], output_cities['state_name'], output_cities['city']):
    label = '{}, {}'.format(city, state_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa_2)  
    
map_usa_2

### 4. Enhance the Data: Local Area Populations <a name="4_Enhance_Data"></a> ###

#### 4a. Determine local population catchment area based on lat / lon <a name="4a_Refine"></a> ####

In [26]:
# A refresh of the columns currently present in the dataset
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,city_id
1,North Palm Beach,Florida,26.8217,-80.0574,12993.0,7
2,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,9
3,Belleair Beach,Florida,27.924,-82.8365,1623.0,10
11,Daytona Beach,Florida,29.1959,-81.0935,66645.0,32
15,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,38


In [27]:
# For every degree of latitude, 69.172 miles
# Source: https://gis.stackexchange.com/questions/142326/calculating-longitude-length-in-miles
output_cities['lat_area_upper'] = output_cities['lat']+((1/69.172)*local_miles)
output_cities['lat_area_lower'] = output_cities['lat']-((1/69.172)*local_miles)
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,city_id,lat_area_upper,lat_area_lower
1,North Palm Beach,Florida,26.8217,-80.0574,12993.0,7,26.966267,26.677133
2,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,9,28.040867,27.751733
3,Belleair Beach,Florida,27.924,-82.8365,1623.0,10,28.068567,27.779433
11,Daytona Beach,Florida,29.1959,-81.0935,66645.0,32,29.340467,29.051333
15,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,38,27.609467,27.320333


In [28]:
# For every degree of longitude: eg:
#  Convert your latitude into radians ~ 0.65038
#  Take the cosine of the value in radians ~ 0.79585
#  1 degree of Longitude = ~0.79585 * 69.172 = ~ 55.051 miles
output_cities['lng_area_upper'] = output_cities['lng'] + ((1/(np.cos(np.radians(output_cities['lat']))*69.172))*10)
output_cities['lng_area_lower'] = output_cities['lng'] - ((1/(np.cos(np.radians(output_cities['lat']))*69.172))*10)
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,city_id,lat_area_upper,lat_area_lower,lng_area_upper,lng_area_lower
1,North Palm Beach,Florida,26.8217,-80.0574,12993.0,7,26.966267,26.677133,-79.895405,-80.219395
2,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,9,28.040867,27.751733,-82.680725,-83.007875
3,Belleair Beach,Florida,27.924,-82.8365,1623.0,10,28.068567,27.779433,-82.672883,-83.000117
11,Daytona Beach,Florida,29.1959,-81.0935,66645.0,32,29.340467,29.051333,-80.927894,-81.259106
15,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,38,27.609467,27.320333,-82.53277,-82.85863


#### 4b. Run a function to get the sum of local populations for each location <a name="4b_localpops"></a> ####

In [29]:
# For every city in the output_cities dataset, get a sum of the population where cities are within:
#  latitude upper and lower ranges, and
#  longitude upper and lower ranges
# Create a function to do this

def get_populations(cities_df, output_df):
    
    # Reset Index
    output_df = output_df.reset_index()
    
    # Get the number of iterations based on count of rows
    city_count = len(output_df.index)
    
    # Create a blank dataframe where city and population sum can be appended.
    city_population = pd.DataFrame(columns=['city','state_name','local_area_pop'])
    
    for i in range(city_count):
        row_num = i
        city_state = output_df.loc[row_num, 'state_name'] # state name
        city_name = output_df.loc[row_num, 'city'] # city name
        area_lat_up = output_df.loc[row_num, 'lat_area_upper'] # upper latitude value
        area_lat_lo = output_df.loc[row_num, 'lat_area_lower'] # lower latitude value
        area_lng_up = output_df.loc[row_num, 'lng_area_upper'] # upper longitude value
        area_lng_lo = output_df.loc[row_num, 'lng_area_lower'] # lower longitude value

        
        pop_sum = cities_df['population'][(cities_df.lat >= area_lat_lo)
                                          & (cities_df.lat <= area_lat_up)
                                          & (cities_df.lng >= area_lng_lo)
                                          & (cities_df.lng <= area_lng_up)].sum()
        
        city_population = city_population.append({'city' : city_name , 'state_name' : city_state, 'local_area_pop': pop_sum} , ignore_index=True)

    return city_population

print('Function created for Get Populations')

Function created for Get Populations


In [30]:
# Run the function to get the population within a 10-mile radius
pops_of_cities = get_populations(usa_cities, output_cities)
print(pops_of_cities.head())
print(pops_of_cities.shape)

                 city state_name  local_area_pop
0    North Palm Beach    Florida        305151.0
1  Indian Rocks Beach    Florida        362321.0
2      Belleair Beach    Florida        350273.0
3       Daytona Beach    Florida        202544.0
4     Bradenton Beach    Florida        785216.0
(64, 3)


In [31]:
# Inner join this table to the main output dataset
output_cities = pd.merge(output_cities, pops_of_cities, on=['city','state_name'])
output_cities = output_cities.drop(['lat_area_upper','lat_area_lower','lng_area_upper','lng_area_lower', 'city_id'], axis = 1) # Drop redundant fields
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population,local_area_pop
0,North Palm Beach,Florida,26.8217,-80.0574,12993.0,305151.0
1,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,362321.0
2,Belleair Beach,Florida,27.924,-82.8365,1623.0,350273.0
3,Daytona Beach,Florida,29.1959,-81.0935,66645.0,202544.0
4,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,785216.0


In [32]:
# Remove rows where the local area population is lower than the minimum set in the stakeholder variables
output_cities_2 = output_cities[output_cities['local_area_pop'] >= local_min_pop]
output_cities_2 = output_cities_2.reset_index()
print(output_cities_2.head())
print(output_cities_2.shape)

   index                city state_name      lat      lng  population  \
0      0    North Palm Beach    Florida  26.8217 -80.0574     12993.0   
1      1  Indian Rocks Beach    Florida  27.8963 -82.8443      4316.0   
2      2      Belleair Beach    Florida  27.9240 -82.8365      1623.0   
3      3       Daytona Beach    Florida  29.1959 -81.0935     66645.0   
4      4     Bradenton Beach    Florida  27.4649 -82.6957      1262.0   

   local_area_pop  
0        305151.0  
1        362321.0  
2        350273.0  
3        202544.0  
4        785216.0  
(38, 7)


In [33]:
# Refresh the map, which now shows only coastal locations
map_usa_3 = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, state_name, city in zip(output_cities_2['lat'], output_cities_2['lng'], output_cities_2['state_name'], output_cities_2['city']):
    label = '{}, {}'.format(city, state_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa_3)  
    
map_usa_3

### 5. Enhance the Data: Other Local Businesses <a name="5_Enhance_Local"></a> ###

#### 5a. Determine local businesses nearby and whether they would have positive / negative impact <a name="5a_Refine"></a> ####

In [34]:
# Create a function to iterate through all cities in the output_cities_2 DataFrame to identify count of good and bad businesses
def find_a_beach(the_dataframe, good_bus, bad_bus,
                 test_run = 1): # Better testing environment by giving option to limit the number of iterations
    
    # Create a blank dataframe where city and has beach flag can be appended.
    city_businesses = pd.DataFrame(columns=['city','state_name','goodcount','badcount'])
    
    # Identify the number of required iterations: maximum index number.
    cities = len(the_dataframe.index)
    
    # Number of iterations limited to two if test run is true.
    if test_run == 1:
        totalruns = 10
    else:
        totalruns = cities

    # Loop for each city in the DataFrame.
    for i in range(totalruns):
        
        # Define the variables for the Foursquare call
        city_latitude = the_dataframe.loc[i, 'lat'] # neighborhood latitude value
        city_longitude = the_dataframe.loc[i, 'lng'] # neighborhood longitude value
        city_state = the_dataframe.loc[i, 'state_name'] # state name
        city_name = the_dataframe.loc[i, 'city'] # city name
        
        # Build the Foursquare call
        latitude = city_latitude
        longitude = city_longitude
        radius = business_distance
        LIMIT = 5
        cat_id = '4bf58dd8d48988d1e2941735'
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, cat_id, LIMIT)
        
        # Call the data
        results = requests.get(url).json()
        results_string = json.dumps(results)
        
        # Search the Foursquare string for the words related to advantageous businesses
        good_count = 0
        for i_g in good_bus:
            good_ones = results_string.count(i_g)
            good_count = good_count + good_ones
        
        # Search the Foursquare string for the words related to disadvantageous businesses
        bad_count = 0
        for i_b in bad_bus:
            bad_ones = results_string.count(i_b)
            bad_count = bad_count + bad_ones
        
        # Pass the row elements as key value pairs to append() function 
        city_businesses = city_businesses.append({'city' : city_name , 'state_name' : city_state, 'goodcount': good_count, 'badcount': bad_count} , ignore_index=True)

    global city_output
    city_output = city_businesses[city_businesses['goodcount'] > 0]
            
    # Present the dataframe
    return city_output

print('Function created')

Function created


In [35]:
# Test the function on a single location before running over the whole dataset
df_businesses = find_a_beach(output_cities_2, business_good, business_bad, test_run = 1)
df_businesses

Unnamed: 0,city,state_name,goodcount,badcount
0,North Palm Beach,Florida,15,0
1,Indian Rocks Beach,Florida,30,0
2,Belleair Beach,Florida,21,0
3,Daytona Beach,Florida,5,0
4,Bradenton Beach,Florida,27,0
5,Belleair Shores,Florida,33,0
6,Hillsboro Beach,Florida,13,0
7,Saint Pete Beach,Florida,29,0
8,West Palm Beach,Florida,4,0
9,Boynton Beach,Florida,5,0


In [36]:
# Run the function over the dataset
print(output_cities_2.shape)
df_businesses = find_a_beach(output_cities_2, business_good, business_bad)
df_businesses['local_bus_score'] = df_businesses['goodcount'] - df_businesses['badcount']
df_businesses

(38, 7)


Unnamed: 0,city,state_name,goodcount,badcount,local_bus_score
0,North Palm Beach,Florida,15,0,15
1,Indian Rocks Beach,Florida,30,0,30
2,Belleair Beach,Florida,21,0,21
3,Daytona Beach,Florida,5,0,5
4,Bradenton Beach,Florida,27,0,27
5,Belleair Shores,Florida,33,0,33
6,Hillsboro Beach,Florida,13,0,13
7,Saint Pete Beach,Florida,29,0,29
8,West Palm Beach,Florida,4,0,4
9,Boynton Beach,Florida,5,0,5


#### 5b. Generate a local business score based on good minus bad <a name="5b_BusScore"></a> ####

In [37]:
# Inner join the business information to the main data
output_cities_3 = pd.merge(output_cities_2, df_businesses, on=['city','state_name'])
output_cities_3 = output_cities_3.drop(['goodcount','badcount'], axis = 1) # Drop redundant fields
output_cities_3.head()

Unnamed: 0,index,city,state_name,lat,lng,population,local_area_pop,local_bus_score
0,0,North Palm Beach,Florida,26.8217,-80.0574,12993.0,305151.0,15
1,1,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,362321.0,30
2,2,Belleair Beach,Florida,27.924,-82.8365,1623.0,350273.0,21
3,3,Daytona Beach,Florida,29.1959,-81.0935,66645.0,202544.0,5
4,4,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,785216.0,27


In [38]:
# Derive a final score based on local population divided by business score.
# Therefore, a lower score indicates better business conditions relative to local area population
output_cities_3['final_score'] = output_cities_3['local_area_pop']/output_cities_3['local_bus_score'] # Drop redundant fields
output_cities_3.head()

Unnamed: 0,index,city,state_name,lat,lng,population,local_area_pop,local_bus_score,final_score
0,0,North Palm Beach,Florida,26.8217,-80.0574,12993.0,305151.0,15,20343.4
1,1,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,362321.0,30,12077.4
2,2,Belleair Beach,Florida,27.924,-82.8365,1623.0,350273.0,21,16679.7
3,3,Daytona Beach,Florida,29.1959,-81.0935,66645.0,202544.0,5,40508.8
4,4,Bradenton Beach,Florida,27.4649,-82.6957,1262.0,785216.0,27,29082.1


In [39]:
# Order the dataset by final score
output_cities_3 = output_cities_3.sort_values('final_score', ascending = True)
output_cities_3 = output_cities_3.reset_index()
output_cities_3.head()

Unnamed: 0,level_0,index,city,state_name,lat,lng,population,local_area_pop,local_bus_score,final_score
0,5,5,Belleair Shores,Florida,27.9173,-82.8455,113.0,357210.0,33,10824.5
1,1,1,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,362321.0,30,12077.4
2,7,8,Saint Pete Beach,Florida,27.7215,-82.7383,9671.0,381800.0,29,13165.5
3,2,2,Belleair Beach,Florida,27.924,-82.8365,1623.0,350273.0,21,16679.7
4,0,0,North Palm Beach,Florida,26.8217,-80.0574,12993.0,305151.0,15,20343.4


### 6. Final Recommendation: Top 5 <a name="6_Final_Rec"></a> ###

In [40]:
# Top 5 recommendations
output_cities_4 = output_cities_3.head()
output_cities_4 = output_cities_4.drop(['level_0','index'], axis = 1) # Drop redundant fields
output_cities_4

Unnamed: 0,city,state_name,lat,lng,population,local_area_pop,local_bus_score,final_score
0,Belleair Shores,Florida,27.9173,-82.8455,113.0,357210.0,33,10824.5
1,Indian Rocks Beach,Florida,27.8963,-82.8443,4316.0,362321.0,30,12077.4
2,Saint Pete Beach,Florida,27.7215,-82.7383,9671.0,381800.0,29,13165.5
3,Belleair Beach,Florida,27.924,-82.8365,1623.0,350273.0,21,16679.7
4,North Palm Beach,Florida,26.8217,-80.0574,12993.0,305151.0,15,20343.4


In [41]:
# Refresh the map, which now shows only coastal locations
map_usa_4 = folium.Map(location=[latitude, longitude], zoom_start=6)

# add markers to map
for lat, lng, state_name, city in zip(output_cities_4['lat'], output_cities_4['lng'], output_cities_4['state_name'], output_cities_4['city']):
    label = '{}, {}'.format(city, state_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa_4)  
    
map_usa_4