# Location Recommender for Maggie & Maggie's Beach Library and Bar #
__Developed by:__ Jono Pike 
<br>
__Development environment:__ Python 3.5 in IBM Watson Studio
<br>
__Hosted on:__ GitHub, https://github.com/Soltis12/

## Table of Contents
[1 Background](#1_Background)<br>
* [1a Project Requirements](#1a_ProjectRequirements)<br>
* [1b Requirements elicitation from stakeholders](#1b_RequirementsElicitation)<br>

[2 Prepare the development environment](#2_Prepare_DE)<br>
* [2a Import the required packages and libraries](#2a_Import_PL)<br>
* [2b Import the US Cities dataset](#2b_Import_USCities)<br>
---

### 1. Background<a name="1_Background"></a> ###

This is a final project for a Data Science course on Coursera, to demonstrate my understanding and ability to apply skills learned.
<br>
<br>
Two friends of mine, both named Maggie, want to open a business located on a beach. They want to create a Beach Library and Bar, where customers can relax on the beach with a drink whilst renting a book.
<br>
<br>
The business will be hence be referred to as '_Maggie & Maggie's_'. We can refer to the stakeholders as <font color=red>Red Maggie</font> and <font color=green>Green Maggie</font>.

#### 1a. Project Requirements<a name="1a_ProjectRequirements"></a> ####

Requirements from __Coursera__ are:
 -  The development environment must be Python
 -  Tool developed must use Foursquare data

Requirements from __the Stakeholders__ are:
 -  The location must be a beach, in the contiguous USA
 -  US States the stakeholders will not consider are:
     -  California
     -  Nebraska
     -  Montana
     -  Alabama
     -  Iowa
     -  Kansas

#### 1b. Requirements elicitation from stakeholders<a name="1b_RequirementsElicitation"></a> ####

Questions were asked to <font color=red>Maggie</font> and <font color=green>Maggie</font>, to further refine the requirements.
<br>
Their respective responses in <font color=red>red</font> and <font color=green>green</font>:
<br>
1.  What is the budget you personally are willing to contribute towards starting the business?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
2.  Are there any locations specifically you have your heart set on?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
3.  How will the business operate during winter or poor weather?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
4.  Would you be willing to locate the business on a lakeshore, or must it be a coastline?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
5.  Please describe the types of customers you want to attract, and the types of customers you don’t want to attract.
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
6.  Describe the ‘feel’ you want the business to have, for yourself and the customer.
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
7.  What range of books do you want your library to specialise in?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
8.  Would you be serving food? If so, what type of food?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
9.  What additional factors to you are essential to include for the business to be a success?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
10. What additional factors do you feel must be avoided for the business to be a success?
    <br>
    <font color=red>response</font>
    <br>
    <font color=green>response</font>
<br>
---

### 2. Prepare the development environment<a name="2_Prepare_DE"></a> ###

#### 2a. Import the required packages and libraries<a name="2a_Import_PL"></a>  ####

In [122]:
# Import the necessary libraries required for the tool
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# map rendering library
!conda install -c conda-forge folium=0.5.0 --yes
import folium 

# read DataFrame to String
from ast import literal_eval

# count occurences in a dictionary
from collections import Counter

print ('Packages and libraries imported')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Packages and libraries imported


#### 2b. Import and prepare the US Cities dataset<a name="2b_Import_USCities"></a>  ####
Source of this data: https://simplemaps.com/data/us-cities

In [47]:
# Load the US Cities data from GitHub repository
usa_cities_raw = pd.read_csv('https://raw.githubusercontent.com/Soltis12/coursera_capstone/master/09_usa_cities.csv')
usa_cities_raw.head()

Unnamed: 0,city,city_ascii,state_id,state_name,county_fips,county_name,lat,lng,population,population_proper,density,source,incorporated,timezone,zips,id
0,Prairie Ridge,Prairie Ridge,WA,Washington,53053,Pierce,47.1443,-122.1408,,,1349.8,polygon,False,America/Los_Angeles,98360 98391,1840037882
1,Edison,Edison,WA,Washington,53057,Skagit,48.5602,-122.4311,,,127.4,polygon,False,America/Los_Angeles,98232,1840017314
2,Packwood,Packwood,WA,Washington,53041,Lewis,46.6085,-121.6702,,,213.9,polygon,False,America/Los_Angeles,98361,1840025265
3,Wautauga Beach,Wautauga Beach,WA,Washington,53035,Kitsap,47.5862,-122.5482,,,261.7,point,False,America/Los_Angeles,98366,1840037725
4,Harper,Harper,WA,Washington,53035,Kitsap,47.5207,-122.5196,,,342.1,point,False,America/Los_Angeles,98366,1840037659


In [48]:
# Variables that relate to this dataset from stakeholder requirements:

# List of excluded states provided by stakeholders, matching dataset entry above
excluded_states = ['California','Nebraska','Alabama','Iowa','Kansas','Montana']

# Minimum population for the city to be considered
min_population = 100000

# Maximum distance for beach to be from city location in metres
max_distance = 2000

# Location requested
wanted_location = 'beach'

print('Variables defined')

Variables defined


In [49]:
# Refine the data. Only include fields that are relevant.
usa_cities = usa_cities_raw[['city','state_name','lat','lng','population']]
usa_cities.head()

Unnamed: 0,city,state_name,lat,lng,population
0,Prairie Ridge,Washington,47.1443,-122.1408,
1,Edison,Washington,48.5602,-122.4311,
2,Packwood,Washington,46.6085,-121.6702,
3,Wautauga Beach,Washington,47.5862,-122.5482,
4,Harper,Washington,47.5207,-122.5196,


In [29]:
# Exclude any rows with a null, as all fields require populated data.
usa_cities = usa_cities.dropna()
usa_cities.head()

Unnamed: 0,city,state_name,lat,lng,population
6,Kahlotus,Washington,46.6436,-118.5566,189.0
8,Washtucna,Washington,46.7539,-118.3104,195.0
10,Toledo,Washington,46.4412,-122.8494,738.0
12,Renton,Washington,47.4757,-122.1904,100953.0
13,Chehalis,Washington,46.6649,-122.966,7498.0


In [50]:
# Get the distinct state names from the dataset, so the list of excluded states can exactly match the names in the dataset.
distinct_states = usa_cities['state_name'].unique()
distinct_states

array(['Washington', 'Virginia', 'Delaware', 'District of Columbia',
       'Wisconsin', 'West Virginia', 'Hawaii', 'Florida', 'Wyoming',
       'New Hampshire', 'New Jersey', 'New Mexico', 'Texas', 'Louisiana',
       'North Carolina', 'North Dakota', 'Nebraska', 'Tennessee',
       'New York', 'Pennsylvania', 'California', 'Nevada', 'Puerto Rico',
       'Colorado', 'Virgin Islands', 'Alaska', 'Alabama', 'Arkansas',
       'Vermont', 'Illinois', 'Georgia', 'Indiana', 'Iowa', 'Oklahoma',
       'Arizona', 'Idaho', 'Connecticut', 'Maine', 'Maryland',
       'Massachusetts', 'Ohio', 'Utah', 'Missouri', 'Minnesota',
       'Michigan', 'Rhode Island', 'Kansas', 'Montana', 'Mississippi',
       'South Carolina', 'Kentucky', 'Oregon', 'South Dakota'], dtype=object)

In [51]:
# Refine the dataset to remove the excluded states
usa_cities = usa_cities[~usa_cities.state_name.isin(excluded_states)]

# Check the output to ensure the excluded states have been removed from the dataframe
distinct_states_2 = usa_cities['state_name'].unique()
distinct_states_2

array(['Washington', 'Virginia', 'Delaware', 'District of Columbia',
       'Wisconsin', 'West Virginia', 'Hawaii', 'Florida', 'Wyoming',
       'New Hampshire', 'New Jersey', 'New Mexico', 'Texas', 'Louisiana',
       'North Carolina', 'North Dakota', 'Tennessee', 'New York',
       'Pennsylvania', 'Nevada', 'Puerto Rico', 'Colorado',
       'Virgin Islands', 'Alaska', 'Arkansas', 'Vermont', 'Illinois',
       'Georgia', 'Indiana', 'Oklahoma', 'Arizona', 'Idaho', 'Connecticut',
       'Maine', 'Maryland', 'Massachusetts', 'Ohio', 'Utah', 'Missouri',
       'Minnesota', 'Michigan', 'Rhode Island', 'Mississippi',
       'South Carolina', 'Kentucky', 'Oregon', 'South Dakota'], dtype=object)

In [145]:
# Refine the dataset to remove any cities too small to be considered
usa_cities = usa_cities[usa_cities['population'] > min_population]
usa_cities = usa_cities.reset_index()
usa_cities = usa_cities.drop(['index'], axis = 1) # Drop the old index
usa_cities.head()

Unnamed: 0,level_0,city,state_name,lat,lng,population
0,0,Renton,Washington,47.4757,-122.1904,100953.0
1,1,Seattle,Washington,47.6217,-122.3238,3541236.0
2,2,Yakima,Washington,46.5926,-120.5492,133687.0
3,3,Kennewick,Washington,46.1979,-119.1732,229624.0
4,4,Kent,Washington,47.3887,-122.2128,127514.0


In [53]:
# Identify number of cities at this stage, with a population over threshold, not in excluded states.
# Further refinements to go: cities near a shoreline.
usa_cities.shape

(322, 5)

#### 2c. Link the refined cities data to the Foursuqare data ####

In [54]:
# The code was removed by Watson Studio for sharing.

In [85]:
# First, test the code on a single city that I know has a beach
# Get the index number of Miama, Florida
miami_df = usa_cities[usa_cities['city'] == 'Miami']
miami_df

Unnamed: 0,city,state_name,lat,lng,population
3640,Miami,Florida,25.784,-80.2102,6247425.0


In [165]:
# Create a blank dataframe where city and has beach flag can be appended
city_has_beach = pd.DataFrame(columns=['beachnum','city','state_name',])
city_has_beach

Unnamed: 0,beachnum,city,state_name


In [86]:
# First, test the code on a single city that I know has a beach
# Get a neighborhood
city_sample = 3640 # Index number for Miami

city_latitude = usa_cities.loc[city_sample, 'lat'] # neighborhood latitude value
city_longitude = usa_cities.loc[city_sample, 'lng'] # neighborhood longitude value
city_state = usa_cities.loc[city_sample, 'state_name'] # state name
city_name = usa_cities.loc[city_sample, 'city'] # city name

print('Latitude and longitude values of {}, {} are {}, {}.'.format(city_name,
                                                               city_state, 
                                                               city_latitude, 
                                                               city_longitude))

Latitude and longitude values of Miami, Florida are 25.784, -80.2102.


In [136]:
# Build the Foursquare call
latitude = city_latitude
longitude = city_longitude
radius = max_distance
LIMIT = 5
cat_id = '4bf58dd8d48988d1e2941735'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, cat_id, LIMIT)

# Call the data
results = requests.get(url).json()
results_string = json.dumps(results)
results_string

'{"meta": {"requestId": "5c20d729dd579735ba654aef", "code": 200}, "response": {"venues": [{"hasPerk": false, "categories": [{"primary": true, "pluralName": "Beaches", "id": "4bf58dd8d48988d1e2941735", "shortName": "Beach", "name": "Beach", "icon": {"suffix": ".png", "prefix": "https://ss3.4sqi.net/img/categories_v2/parks_outdoors/beach_"}}], "id": "53c4b779498e5269b662d16a", "name": "Miami Beach", "referralId": "v-1545656105", "location": {"state": "FL", "formattedAddress": ["Miami, FL", "United States"], "labeledLatLngs": [{"lat": 25.78909864412232, "lng": -80.2217425770895, "label": "display"}], "distance": 1288, "lng": -80.2217425770895, "lat": 25.78909864412232, "cc": "US", "city": "Miami", "country": "United States"}}, {"hasPerk": false, "categories": [{"primary": true, "pluralName": "Beaches", "id": "4bf58dd8d48988d1e2941735", "shortName": "Beach", "name": "Beach", "icon": {"suffix": ".png", "prefix": "https://ss3.4sqi.net/img/categories_v2/parks_outdoors/beach_"}}], "id": "4d90a

In [166]:
# Search the Foursquare string for the word 'Beach'
beaches_count = results_string.count('"shortName": "Beach"')
beaches_count

2

In [167]:
# Pass the row elements as key value pairs to append() function 
city_has_beach = city_has_beach.append({'city' : city_name , 'state_name' : city_state, 'beachnum': beaches_count} , ignore_index=True)
city_has_beach

Unnamed: 0,beachnum,city,state_name
0,2,Miami,Florida


In [243]:
# Create a function to iterate through all cities in the usa_cities DataFrame to identify count of beaches
def find_a_beach(the_dataframe,
                 test_run = 1): # Better testing environment by giving option to limit the number of iterations
    
    # Create a blank dataframe where city and has beach flag can be appended.
    city_has_beach = pd.DataFrame(columns=['beachnum','city','state_name'])
    
    # Identify the number of required iterations: maximum index number.
    cities = the_dataframe['level_0'].max()
    
    # Number of iterations limited to two if test run is true.
    if test_run == 1:
        totalruns = 10
    else:
        totalruns = cities

    # Loop for each city in the DataFrame.
    for i in range(totalruns):
        
        # Define the variables for the Foursquare call
        city_sample = i # Index number
        city_latitude = the_dataframe.loc[city_sample, 'lat'] # neighborhood latitude value
        city_longitude = the_dataframe.loc[city_sample, 'lng'] # neighborhood longitude value
        city_state = the_dataframe.loc[city_sample, 'state_name'] # state name
        city_name = the_dataframe.loc[city_sample, 'city'] # city name
        
        # Build the Foursquare call
        latitude = city_latitude
        longitude = city_longitude
        radius = max_distance
        LIMIT = 5
        cat_id = '4bf58dd8d48988d1e2941735'
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, cat_id, LIMIT)
        
        # Call the data
        results = requests.get(url).json()
        results_string = json.dumps(results)
        
        # Search the Foursquare string for the word 'Beach'
        beaches_count = results_string.count('"shortName": "Beach"')
        
        # Pass the row elements as key value pairs to append() function 
        city_has_beach = city_has_beach.append({'city' : city_name , 'state_name' : city_state, 'beachnum': beaches_count} , ignore_index=True)
    
    # Present the dataframe
    global output
    output = city_has_beach[city_has_beach['beachnum'] > 0]
    output = output.reset_index() # Re-index
    output = output.drop(['index'], axis = 1) # Drop the old index
    output['index_col'] = output.index
    result_count = (output['index_col'].max()+1)
    
    print('{} cities were returned from your search'.format(result_count))
    
    return output

print('Function created')

Function created


In [245]:
# Test the function
cities_with_beaches = find_a_beach(usa_cities,0)
cities_with_beaches

117 cities were returned from your search


Unnamed: 0,beachnum,city,state_name,index_col
0,1,Renton,Washington,0
1,3,Seattle,Washington,1
2,1,Olympia,Washington,2
3,1,Bremerton,Washington,3
4,1,Bellevue,Washington,4
5,4,Bellingham,Washington,5
6,2,Spokane,Washington,6
7,5,Richmond,Virginia,7
8,1,Alexandria,Virginia,8
9,1,Lynchburg,Virginia,9


In [255]:
# Inner Join the cities with beaches to the usa cities dataset
output_cities = pd.merge(cities_with_beaches, usa_cities, on=['city','state_name'])
output_cities = output_cities.drop(['beachnum','index_col','level_0'], axis = 1) # Drop redundant fields
output_cities.head()

Unnamed: 0,city,state_name,lat,lng,population
0,Renton,Washington,47.4757,-122.1904,100953.0
1,Seattle,Washington,47.6217,-122.3238,3541236.0
2,Olympia,Washington,47.0417,-122.8958,194532.0
3,Bremerton,Washington,47.5436,-122.7121,214549.0
4,Bellevue,Washington,47.5953,-122.155,141400.0


#### 2d. Plot the current list onto a map to give summary thus far ####

In [258]:
# create map of New York using latitude and longitude values
map_usa = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, state_name, city in zip(output_cities['lat'], output_cities['lng'], output_cities['state_name'], output_cities['city']):
    label = '{}, {}'.format(city, state_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_usa)  
    
map_usa