# Introduction  #

### Typically, people go to more than one location when they're out: dinner and a movie, an oil change and pick up pizza, two different fast food places because... kids. Most mapping software only provides you with either everything, or one thing, neither of which are super useful if I'm looking for Thai places that are within walking distance of a oil change place. Search oil change place, try to remember where the oil change place that I thought would work while searching for Thai is next to impossible, then there's not one, and I have to recenter myself on a DIFFERENT oil change place.  ###





# Data #

#### I will use user input (sought venues, starting location) and  Four Square API. I will create dataframes with relevant results.. Using this information, I can plot the locations on a map so a user can easily determine which venues are going to be the limiting factor, and see where there are clusters that serve the user's individual needs. ####

# Methodology #

#### For the purposes of grading, I hard coded the user inputs.  I also put in a data entry element, if someone wants to run that block. I did some sets of calls to figure out the right parameters for the API call get useful information (e.g. changing the intent to "browse" was useful), and tried it with many different use cases. ####

Then the data cleaning continued: I realized that searching for "salad" or "fast food" would return some weird results. I then pulled out the primary category ids for the first search, then reran the search with that category. That minimized the amount of times that I would get things like "Salado Creek" when searching for "salad", and give more applicable results. 

Then I realized that was better, but still kind of sucked. So I thought I would first search to see if there was a matching category for the search term, and if there was, look for that. Otherwise, it seemed like it just looked at the name. So a search for "fast food" would turn up "Fast House Sales" for some indeterminable reason, but if I just searched by category "fast food", WAY better results. 

Also, I had to kludge out the capitalization issue. My category csv and user input weren't always matched, and since users are users and may (read: will) ignore case suggestions, I forced everything to lowercase. 

# Results #

I found that depending on the search parameters, the quality of results varied dramatically. However, being able to visualize multiple venue types on a single map allows a user to visually identify several important things: clusters of desired capabilities; what is not very dense in the area of interest (meaning maybe I should give up on Ethiopian food, and settle for Thai); and what is the specific solution presented with hover details. This combination of capabilities is compelling, and has high potential utility to the user. 

# Discussion #

Originally, I had thought to find the nearest clusters, and did all the work to cluster the multiple inputs based on distance using KNN, but then I realized that made it so "I don't like X" multiplied by the number of venues searched would make it cumbersome for the user..sort of like Russian Roulette. So I realized presenting different series on a map would allow the user to visually cluster it, and also easily determine what the limiting venue is. So if they can see that there are 4 blue dots and 40 red dots, they may need to be a little less fussy on the blue dots. So although this didn't end up being perfectly aligned with the course, I am happy with the results. 

For better developers, additional ways to constrain or titrate the data would be good. For example, a search for "Burgers" when searching by category ID or the term "burgers" doesn't return In and Out Burgers, which seems a glaring omission. "Burger" does, however. So if I were to spend more time on it, I would figure out how to create equivalent search terms and search that way, or at least remove plurals. 

The other issue is if someone searches for something that has multiple category matches, and my algorithm finds the wrong one. So if they search for "medical", it will only return things from category "medical center". So if they're looking for "medical supply" or "medical school", too bad. A better programmer would find a way to mitigate that issue, but I'm just not there yet. 

I would also allow constraint by distance to get more results that are close. I'm not sure how Four Square selects the 50 to return, but I would work on that. The results I got were more widely distributed than made sense, and of variable utility.

# Conclusion #

Getting Four Square data into some useful format for the user has several issues, the largest of which is matching user intent with the result sets. And although this is not perfect in what it accomplishes, it was a challenging, exciting, and fun project, with an output I will keep to use later. I will likely continue to play with it to narrow the result sets to a more reasonable distance, make the map show up bigger in the notebook, and see if I can figure out how to draw circles around all the clusters with relevant information to draw the user's eye. 

In [15]:
#import a bunch of libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


In [9]:
#document four square credentials
CLIENT_ID = 'M032WMNUKYVUT31JVRWTR3IOUSEFPNYYLD0UPYUOOMDZ1Z5Z' # your Foursquare ID
CLIENT_SECRET = 'Z1CUL0XMODVSQS4VECBDONVWUGSDSFY10NYA0WPNWRWU3E5N' # your Foursquare Secret
ACCESS_TOKEN = '12AQLWDMLGLET2D5XRNKOQ2UNU43XGCTM0GJ5GR0HII3QMXU' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value


In [249]:
#create a pandas dataframe with the four square categories

url = "https://github.com/Gkoliver/Coursera_Capstone/raw/main/category_codes.csv"
codes_df = pd.read_csv(url, sep=",", error_bad_lines=False)



In [250]:
# now force the category name into lower case
codes_df['category_name'] = codes_df['category_name'].str.lower()

In [251]:
#this is to define the initial inputs, but the app would clearly allow user input
#this is placeholder data, if someone doesn't want to run it. 

inputAddress = "78259"
country = "US"
input1 = "Park"
input2 = "Hamburger"
input3 = "Chinese"
input4 = ""

In [73]:
#if you're looking at this and want to put in your own inputs, run this cell

inputAddress = input("Please enter the starting point (address, zip):")
country = input ("Please enter your country:")
input1 = input ("Please enter what you're looking for:")
input2 = input ("Please enter what <else> you're looking for: (hit enter if nothing)")
if len (input2) > 1:
    input3 = input ("Anything else you're trying to find?(hit enter if nothing)")
    if len (input3) > 1:
        input4 = input ("Last one: any final thoughts?")



Please enter the starting point (address, zip):78259
Please enter your country:US
Please enter what you're looking for:fast food
Please enter what <else> you're looking for: (hit enter if nothing)post office
Anything else you're trying to find?(hit enter if nothing)


In [264]:
#for each input, see if there's a category that matches. if there is, search based on category code, not on search query
catCode1 = ""
catCode2 = ""
catCode3 = ""
catCode4 = ""


def getCatCode (input):
    a = codes_df[codes_df['category_name'].str.contains(input.lower())]['category_id']
    if a.empty:
        return ""
    else:
        return a.array[0]

if len (input1) > 1:
    catCode1 = getCatCode (input1)
if len (input2) > 1:
    catCode2 = getCatCode (input2)
if len (input3) > 1:
    catCode3 = getCatCode (input3)
if len (input4) > 1:
    catCode4 = getCatCode (input4)


### Create a geocoder instance, and get the long/lat of the start point ###

In [265]:
geolocator = Nominatim(user_agent="foursquare_agent")
srch = inputAddress, country
location = geolocator.geocode(srch)
latitude = location.latitude
longitude = location.longitude
intent = 'browse'
ll = latitude, longitude


In [254]:
#get the result sets, and convert them into dataframes, and drop venues that don't have a name
if len (input1) > 1:
    if len(catCode1) > 0:
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&categoryId={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, catCode1, LIMIT, intent)
    else:
         url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, input1, LIMIT, intent)

    results = requests.get(url).json()
    venues1 = results['response']['venues']
    df_1 = pd.json_normalize(venues1)
    df_1.dropna(subset = ["name"], inplace=True)
    
#now the next one
if len (input2) > 1: 
    if len(catCode2) > 0:
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&categoryId={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, catCode2, LIMIT, intent)
    else:
         url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, input2, LIMIT, intent)
    results = requests.get(url).json()
    venues2 = results['response']['venues']
    df_2 = pd.json_normalize(venues2)
    df_2.dropna(subset = ["name"], inplace=True)
    
#now the next one
if len (input3) > 1:
    if len(catCode3) > 0:
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&categoryId={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, catCode3, LIMIT, intent)
    else:
         url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, input3, LIMIT, intent)
    results = requests.get(url).json()
    venues3 = results['response']['venues']
    df_3 = pd.json_normalize(venues3)
    
#now the last one
if len (input4) > 1:
    if len(catCode4) > 0:
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&categoryId={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, catCode4, LIMIT, intent)
    else:
         url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&limit={}&intent{}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, input4, LIMIT, intent)
    results = requests.get(url).json()
    venues4 = results['response']['venues']
    df_4 = pd.json_normalize(venues4)


In [255]:
#this is metric fricktons of code to create a legend
from branca.element import Template, MacroElement

templateStart = """
{% macro html(this, kwargs) %}

<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>jQuery UI Draggable - Default functionality</title>
  <link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">

  <script src="https://code.jquery.com/jquery-1.12.4.js"></script>
  <script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
  
  <script>
  $( function() {
    $( "#maplegend" ).draggable({
                    start: function (event, ui) {
                        $(this).css({
                            right: "auto",
                            top: "auto",
                            bottom: "auto"
                        });
                    }
                });
});

  </script>
</head>
<body>

 
<div id='maplegend' class='maplegend' 
    style='position: absolute; z-index:9999; border:2px solid grey; background-color:rgba(255, 255, 255, 0.8);
     border-radius:6px; padding: 10px; font-size:14px; right: 20px; bottom: 20px;'>
     
<div class='legend-title'>Legend (draggable) </div>
<div class='legend-scale'>
  <ul class='legend-labels'>"""


#function to create the legend items

#create the legend for the list items
lstart = "<li><span style='background:"
lmid = ";opacity:0.7;'></span>"
lend = "</li>"
def litem (color, txt):
    strLitem = lstart + color +  lmid + txt + lend
    return strLitem

templateMid = " "
if len (input1) > 1:
    templateMid = templateMid + litem ("red", input1)
if len (input2) > 1:
    templateMid = templateMid + litem ("green", input2)
if len (input3) > 1:
    templateMid = templateMid + litem ("orange", input3) 
if len (input4) > 1:
    templateMid = templateMid + litem ("purple", input4)    
    
   #needs to look like this <li><span style='background:red;opacity:0.7;'></span>Big</li>
    #<li><span style='background:orange;opacity:0.7;'></span>Medium</li>
    #<li><span style='background:green;opacity:0.7;'></span>Small</li>

templateEnd = """
  </ul>
</div>
</div>
 
</body>
</html>

<style type='text/css'>
  .maplegend .legend-title {
    text-align: left;
    margin-bottom: 5px;
    font-weight: bold;
    font-size: 90%;
    }
  .maplegend .legend-scale ul {
    margin: 0;
    margin-bottom: 5px;
    padding: 0;
    float: left;
    list-style: none;
    }
  .maplegend .legend-scale ul li {
    font-size: 80%;
    list-style: none;
    margin-left: 0;
    line-height: 18px;
    margin-bottom: 2px;
    }
  .maplegend ul.legend-labels li span {
    display: block;
    float: left;
    height: 16px;
    width: 30px;
    margin-right: 5px;
    margin-left: 0;
    border: 1px solid #999;
    }
  .maplegend .legend-source {
    font-size: 80%;
    color: #777;
    clear: both;
    }
  .maplegend a {
    color: #777;
    }
</style>
{% endmacro %}"""

macro = MacroElement()
macro._template = Template(templateStart + templateMid +templateEnd )

## venues_map.get_root().add_child(macro) #this is to add it in



In [263]:
#now see if I can map these and label the "clusters" of inputs
width, height = '100%', 350
venues_map = folium.Figure(width=width, height='100%')
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) # generate map centred around the start point

# add a black circle marker to represent the center
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='black',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

##red, green, orange, purple 
# add the first search as red circle markers

#function to add in circle markers

def placeMarkers (color, df):

    for lat, lng, label in zip(df["location.lat"], df["location.lng"], df.name + "," + df["location.address"]):
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            color=color,
            popup=label,
            fill = True,
            fill_color=color,
            fill_opacity=0.6
        ).add_to(venues_map)
    return None
   
#place the markers
if len (input1) > 1:
    
    placeMarkers ("red", df_1)
if len (input2) > 1:
    placeMarkers ("green", df_2) 
if len (input3) > 1:
    placeMarkers ("orange", df_3)
if len (input4) > 1:
    placeMarkers ("purple", df_4)
   
#add in the labels
venues_map.get_root().add_child(macro)
# display map
venues_map





