# Finding a New Pet Store Location in St. Paul, MN

## Introduction / Business Problem

In 2019, US residents are estimated to spend [more than $75 billion](https://www.americanpetproducts.org/press_industrytrends.asp) for their pets. With more than 40\% of it being spent on food, pet store is a lucrative business to get into. 

Our friend, John Doe, wants to open a new pet store in St. Paul, MN area, and has asked me to help finding a location for it. Cost of doing business in a metropolis city like St. Paul can be stratospheric, so location needs to be analyzed carefully. The insight will provide a good understanding on the target market and reduce risk of opening a location in the wrong side of the city. 

## Target Audience

John Doe will be my target audience for this project. The objective is to locate a neighborhood to open his new pet store in the city of St. Paul, MN. We will ensure the neighborhood that I pick is the most optimum one. 

## Data

1. Zipcode data from [uszipcode](https://uszipcode.readthedocs.io/index.html#) Python package. It provides data for the **latitude and longitude** of all zipcode in the St. Paul, MN area along with the **median household income**.

In [1]:
# importing packages
import pandas as pd
import numpy as np
from uszipcode import SearchEngine
import requests
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

In [2]:
# using the uszipcode python package
searchzip = SearchEngine(simple_zipcode=False) # set simple_zipcode=False to use rich info database

In [3]:
# get the zip code from the surrounding St. Paul, MN area:
res = searchzip.by_city_and_state("Saint Paul", "MN", returns=40)

Let's create a dataframe containing all the zipcodes for St. Paul, MN area:

In [4]:
zipStPaul = pd.DataFrame(columns = ['zipcode','med_income','latitude','longitude'])

In [5]:
for data in res:
    zipcode = data.zipcode
    medincome = data.median_household_income
    latit = data.lat
    long = data.lng

    zipStPaul = zipStPaul.append({'zipcode': zipcode,
                                 'med_income': medincome,
                                 'latitude': latit,
                                 'longitude': long},ignore_index=True)

In [6]:
zipStPaul.head()

Unnamed: 0,zipcode,med_income,latitude,longitude
0,55101,40300,44.95,-93.09
1,55102,46255,44.93,-93.12
2,55103,28899,44.97,-93.13
3,55104,44629,44.96,-93.17
4,55105,76472,44.93,-93.16


In [7]:
zipStPaul.shape

(30, 4)

There are **thirty zip code** data that can be used for the FourSquare Places API.  
We will use _folium_ Python package to visualize the data:

First, we use geopy library to get the latitude and longitude values of St Paul, MN.

In [9]:
address = 'Saint Paul, MN'

geolocator = Nominatim(user_agent="spaul_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of St Paul, MN are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of St Paul, MN are 44.9504037, -93.1015026.


In [10]:
# create map of St. Paul using latitude and longitude values
map_stPaul = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, zipcode, medincome in zip(zipStPaul['latitude'], zipStPaul['longitude'], zipStPaul['zipcode'],zipStPaul['med_income']):
    label = '{}, {}'.format(zipcode, medincome)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_stPaul)  
    
map_stPaul

2. FourSquare data from FourSquare [Places API](https://developer.foursquare.com/docs/api). I will utilize the explore function to get venue recommendations and zone in on the pet related categories (e.g. pet store, dog park, dog friendly restaurants) 

In [12]:
# get the ID and secret from the obfuscated file 
CLIENT_ID = pd.read_csv('../../Coursera_Capstone/FSclientID.txt',header=None)[0][0] # your Foursquare ID
CLIENT_SECRET = pd.read_csv('../../Coursera_Capstone/FSclientSecret.txt',header=None)[0][0] # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [13]:
# modified from previous work:
def getNearbyVenues(latitudes, longitudes, radius=500):
    
    venues_list=[]
    for lat, lng in zip(latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()
        results2 = results["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            results['response']['headerLocation'], 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results2])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
LIMIT = 100

In [15]:
stpaul_venues = getNearbyVenues(latitudes=zipStPaul['latitude'],
                               longitudes=zipStPaul['longitude'])

In [16]:
stpaul_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lowertown,44.95,-93.09,Barrel Theory Beer Company,44.951021,-93.088258,Brewery
1,Lowertown,44.95,-93.09,Mears Park,44.949371,-93.08792,Park
2,Lowertown,44.95,-93.09,The Buttered Tin,44.950857,-93.088679,Bakery
3,Lowertown,44.95,-93.09,Handsome Hog,44.949584,-93.089117,Southern / Soul Food Restaurant
4,Lowertown,44.95,-93.09,World of Beer,44.94867,-93.088063,Beer Bar


In [36]:
stpaul_venues.shape

(336, 7)