# Predicting the Success of Starbucks Locations (part 1)

**Author**: <a href = "https://www.linkedin.com/in/alexis-raymond-telfer/">Alexis Raymond</a>  
**Date Modified**: 2019-04-18

## Table of Contents

1. [Introduction: Business Problem](#introduction)
2. [Data Collection and Cleaning](#data)

## 1. Introduction: Business Problem <a name="introduction"></a>

Founded in Seattle, Washington in 1971, **Starbucks** is one of the most important and successful coffee companies in the world today. Some of its sites have known extreme successes, whereas others have had to close their doors following some kind of failure.

This project will focus on the most important determining factor in a Starbuck’s success: it’s location. This includes the competition in the area (*supply*), as well as the demographics of the targeted customers (*demand*). 

**Hence, this project aims to predict whether or not a Starbucks will be successful based on its location.**

### Import analysis libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

## 2. Data Collection and Cleaning <a name="data"></a>

In order to achieve the results presented above, two datasets and the use of a location data API (in this case, Foursquare) are needed. First, Kaggle offers a free dataset containing a record for every Starbucks store in operation as of February 2017 (<a href="https://www.kaggle.com/starbucks/store-locations">Starbucks Locations Worldwide</a>). Second, a list of all US cities with a population of at least 100,000 people as of July 1, 2017 can be scraped from the following Wikipedia page: <a href="https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population">List of United States cities by population</a>. This dataset also contains each city’s area and population density. Finally, the <a href="https://developer.foursquare.com/places-api">Foursquare Places API</a> will provide a rating for every location of interest as well as a detailed list of their nearby venues.

### List of Starbucks Locations

In [2]:
starbucks_locations = pd.read_csv('starbucks_locations.csv') # Convert CSV to dataframe
starbucks_locations.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude
0,Starbucks,47370-257954,"Meritxell, 96",Licensed,"Av. Meritxell, 96",Andorra la Vella,7,AD,AD500,376818720.0,GMT+1:00 Europe/Andorra,1.53,42.51
1,Starbucks,22331-212325,Ajman Drive Thru,Licensed,"1 Street 69, Al Jarf",Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.42
2,Starbucks,47089-256771,Dana Mall,Licensed,Sheikh Khalifa Bin Zayed St.,Ajman,AJ,AE,,,GMT+04:00 Asia/Dubai,55.47,25.39
3,Starbucks,22126-218024,Twofour 54,Licensed,Al Salam Street,Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.38,24.48
4,Starbucks,17127-178586,Al Ain Tower,Licensed,"Khaldiya Area, Abu Dhabi Island",Abu Dhabi,AZ,AE,,,GMT+04:00 Asia/Dubai,54.54,24.51


In [3]:
# Remove useless features
starbucks_locations = starbucks_locations[['City', 'Country', 'Store Number', 'Longitude', 'Latitude']] 
starbucks_locations.head()

Unnamed: 0,City,Country,Store Number,Longitude,Latitude
0,Andorra la Vella,AD,47370-257954,1.53,42.51
1,Ajman,AE,22331-212325,55.47,25.42
2,Ajman,AE,47089-256771,55.47,25.39
3,Abu Dhabi,AE,22126-218024,54.38,24.48
4,Abu Dhabi,AE,17127-178586,54.54,24.51


### List of US Cities by Population

In [4]:
us_cities = pd.read_csv('us_cities.csv') # Convert CSV to dataframe
us_cities.head()

Unnamed: 0,2017rank,City,State[5],2017estimate,2010Census,Change,2016 land area,2016 land area.1,2016 population density,2016 population density.1,Location
0,1,New York[6],New York,8622698,8175133,+5.47%,301.5 sq mi,780.9 km2,"28,317/sq mi","10,933/km2",40°39′49″N 73°56′19″W﻿ / ﻿40.6635°N 73.9387°W
1,2,Los Angeles,California,3999759,3792621,+5.46%,468.7 sq mi,"1,213.9 km2","8,484/sq mi","3,276/km2",34°01′10″N 118°24′39″W﻿ / ﻿34.0194°N 118.4108°W
2,3,Chicago,Illinois,2716450,2695598,+0.77%,227.3 sq mi,588.7 km2,"11,900/sq mi","4,600/km2",41°50′15″N 87°40′54″W﻿ / ﻿41.8376°N 87.6818°W
3,4,Houston[7],Texas,2312717,2100263,+10.12%,637.5 sq mi,"1,651.1 km2","3,613/sq mi","1,395/km2",29°47′12″N 95°23′27″W﻿ / ﻿29.7866°N 95.3909°W
4,5,Phoenix,Arizona,1626078,1445632,+12.48%,517.6 sq mi,"1,340.6 km2","3,120/sq mi","1,200/km2",33°34′20″N 112°05′24″W﻿ / ﻿33.5722°N 112.0901°W


In [5]:
# Remove useless features
us_cities = us_cities[['City', '2016 land area.1', '2016 population density.1']]

# Rename Columns
us_cities.columns = ['City', 'Area', 'Density']
us_cities.head()

Unnamed: 0,City,Area,Density
0,New York[6],780.9 km2,"10,933/km2"
1,Los Angeles,"1,213.9 km2","3,276/km2"
2,Chicago,588.7 km2,"4,600/km2"
3,Houston[7],"1,651.1 km2","1,395/km2"
4,Phoenix,"1,340.6 km2","1,200/km2"


In [6]:
# Define function to remove brackets from city name
def clean_city_name(city_name) :
    if city_name.find('[') == -1 : # If there aren't brackets in the city name, return the city name
        return city_name
    
    else : # If there are brackets in the city name, return the city name without the brackets
        return city_name[:city_name.find('[')]

In [7]:
# Clean city names
us_cities['City'] = us_cities['City'].apply(clean_city_name)
us_cities.head()

Unnamed: 0,City,Area,Density
0,New York,780.9 km2,"10,933/km2"
1,Los Angeles,"1,213.9 km2","3,276/km2"
2,Chicago,588.7 km2,"4,600/km2"
3,Houston,"1,651.1 km2","1,395/km2"
4,Phoenix,"1,340.6 km2","1,200/km2"


In [8]:
# Check which cities are not unique
city_counts = us_cities['City'].value_counts()
city_counts = city_counts[city_counts > 1]
city_counts

Springfield    3
Richmond       2
Glendale       2
Pasadena       2
Rochester      2
Columbus       2
Aurora         2
Kansas City    2
Peoria         2
Columbia       2
Lakewood       2
Name: City, dtype: int64

In [9]:
# Make a python list of all duplicate cities
duplicates = city_counts.index.tolist()
duplicates

['Springfield',
 'Richmond',
 'Glendale',
 'Pasadena',
 'Rochester',
 'Columbus',
 'Aurora',
 'Kansas City',
 'Peoria',
 'Columbia',
 'Lakewood']

In [10]:
# Identify cities that are in multiple states
us_cities['Unique'] = us_cities['City'].apply(lambda city: city not in duplicates)

# Remove duplicate cities from the dataframe
us_cities = us_cities[us_cities['Unique'] == True]

# Remove unique column
us_cities.drop('Unique', 1, inplace = True)

us_cities['City'].value_counts().head()

Huntington Beach    1
St. Petersburg      1
Roseville           1
Gilbert             1
Syracuse            1
Name: City, dtype: int64

In [11]:
# Remove units from Area and Density columns
us_cities['Area'] = us_cities['Area'].apply(lambda area: area[:area.find('\xa0km2')])
us_cities['Density'] = us_cities['Density'].apply(lambda density: density[:density.find('/km')])
us_cities.head()

Unnamed: 0,City,Area,Density
0,New York,780.9,10933
1,Los Angeles,1213.9,3276
2,Chicago,588.7,4600
3,Houston,1651.1,1395
4,Phoenix,1340.6,1200


In [12]:
# Remove commas from Area and Density columns
us_cities['Area'] = us_cities['Area'].apply(lambda area: area.replace(',',''))
us_cities['Density'] = us_cities['Density'].apply(lambda density: density.replace(',',''))
us_cities.head()

Unnamed: 0,City,Area,Density
0,New York,780.9,10933
1,Los Angeles,1213.9,3276
2,Chicago,588.7,4600
3,Houston,1651.1,1395
4,Phoenix,1340.6,1200


### Combine Starbucks Dataframe with US Cities

In [13]:
starbucks_locations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 5 columns):
City            25585 non-null object
Country         25600 non-null object
Store Number    25600 non-null object
Longitude       25599 non-null float64
Latitude        25599 non-null float64
dtypes: float64(2), object(3)
memory usage: 1000.1+ KB


In [14]:
# Remove all Starbucks that are not in the US
starbucks_locations = starbucks_locations[starbucks_locations['Country'] == 'US']
starbucks_locations['City'].value_counts().head()

New York     232
Chicago      180
Seattle      156
Las Vegas    156
Houston      154
Name: City, dtype: int64

In [15]:
# Make list of all US cities with a population larger than 100 000 
us_cities_list = us_cities['City'].tolist()

In [16]:
# Identify Starbucks that are not in the identified cities
starbucks_locations['US'] = starbucks_locations['City'].apply(lambda city: city in us_cities_list)

# Remove Starbucks that are not in the identified cities from the dataframe
starbucks_locations = starbucks_locations[starbucks_locations['US'] == True]

# Remove US column
starbucks_locations.drop('US', 1, inplace = True)

starbucks_locations['City'].value_counts().head()

New York     232
Chicago      180
Seattle      156
Las Vegas    156
Houston      154
Name: City, dtype: int64

In [17]:
# Add area and density for each starbucks location
starbucks_locations = starbucks_locations.merge(us_cities, 'inner', 'City')
starbucks_locations.head()

Unnamed: 0,City,Country,Store Number,Longitude,Latitude,Area,Density
0,Anchorage,US,3513-125945,-149.78,61.21,4420.1,68
1,Anchorage,US,74352-84449,-149.84,61.14,4420.1,68
2,Anchorage,US,12449-152385,-149.85,61.11,4420.1,68
3,Anchorage,US,24936-233524,-149.89,61.13,4420.1,68
4,Anchorage,US,8973-85630,-149.86,61.14,4420.1,68


In [18]:
starbucks_locations.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5869 entries, 0 to 5868
Data columns (total 7 columns):
City            5869 non-null object
Country         5869 non-null object
Store Number    5869 non-null object
Longitude       5869 non-null float64
Latitude        5869 non-null float64
Area            5869 non-null object
Density         5869 non-null object
dtypes: float64(2), object(5)
memory usage: 366.8+ KB


### Find Nearby Venues for Each Location

In [19]:
# Foursquare credentials
CLIENT_ID = 'XINGNND4JIVSG3QCPPOYLVDNAGEUAIJJ0K5DFLJCW5JCSZXF'
CLIENT_SECRET = '3QOKG5X3SPUO4LX1RSFVO1GS5QNPW2LLQTF4RWAJ01LLOUP1'
VERSION = '20190220'
LIMIT = 50

In [20]:
# Define function to retrieve nearby venues
def getNearbyVenues(numbers, latitudes, longitudes, radius=500):
    
    # Create an empty list of venues
    venues_list=[]
    
    # Initiate counters
    no_nearby_venues = 0
    with_nearby_venues = 0
    
    # Loop through neighborhoods
    for number, lat, lng in zip(numbers, latitudes, longitudes):
        
        try : # Try to find nearby venues to the Starbucks
            
            # Create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)

            # Make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']

            # Return only relevant information for each nearby venue
            venues_list.append([(number, v['venue']['categories'][0]['name']) for v in results])
            
            # Add 1 to to the count of Starbucks with at least 1 nearby venue
            with_nearby_venues += 1
    
        except : # Add 1 to the count of Starbucks with no nearby venues if error occurs
            no_nearby_venues += 1
    
    # Create dataframe with close venues to each Starbucks
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Store Number', 'Venue Category']
    
    # Print the number of Starbucks for which no nearby venues were found
    print('Starbucks with no nearby venues:', no_nearby_venues)
    
    # Print the number of Starbucks for which at least one nearby venue was found
    print('Starbucks with at least one nearby venue:', with_nearby_venues)
    
    # Return nearby venues dataframe
    return(nearby_venues)

In [21]:
# Create dataframe with each nearby venue's category to each Starbucks
nearby_venues = getNearbyVenues(numbers = starbucks_locations['Store Number'],
                                latitudes = starbucks_locations['Latitude'],
                                longitudes = starbucks_locations['Longitude'])
nearby_venues.head()

Starbucks with no nearby venues: 3
Starbucks with at least one nearby venue: 5866


Unnamed: 0,Store Number,Venue Category
0,3513-125945,Pizza Place
1,3513-125945,Ice Cream Shop
2,3513-125945,Coffee Shop
3,3513-125945,Supermarket
4,3513-125945,Intersection


In [22]:
# One hot encoding
starbucks_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# Add store number column back to dataframe
starbucks_onehot['Store Number'] = nearby_venues['Store Number'] 

# Move store number column to the first column
fixed_columns = [starbucks_onehot.columns[-1]] + list(starbucks_onehot.columns[:-1])
starbucks_onehot = starbucks_onehot[fixed_columns]

# Show the first 5 entries of the dataframe
starbucks_onehot.head()

Unnamed: 0,Store Number,ATM,Accessories Store,Acupuncturist,Adult Boutique,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,...,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,3513-125945,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3513-125945,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3513-125945,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3513-125945,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3513-125945,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
# Group rows by store number and by taking the mean of the frequency of occurrence of each category
starbucks_grouped = starbucks_onehot.groupby('Store Number').mean().reset_index()

# Set store number as index
starbucks_grouped.set_index('Store Number', inplace = True)
starbucks_grouped.head()

Unnamed: 0_level_0,ATM,Accessories Store,Acupuncturist,Adult Boutique,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,...,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
Store Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10001-99525,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10005-97691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10008-98261,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10009-98952,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10012-98199,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
# Set Store Number as index
starbucks_locations.set_index('Store Number', inplace = True)

In [25]:
# Add nearby venues to each starbucks locations
starbucks_locations = starbucks_locations.join(starbucks_grouped, how = 'inner')
starbucks_locations.head()

Unnamed: 0_level_0,City,Country,Longitude,Latitude,Area,Density,ATM,Accessories Store,Acupuncturist,Adult Boutique,...,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
Store Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3513-125945,Anchorage,US,-149.78,61.21,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
74352-84449,Anchorage,US,-149.84,61.14,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12449-152385,Anchorage,US,-149.85,61.11,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24936-233524,Anchorage,US,-149.89,61.13,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8973-85630,Anchorage,US,-149.86,61.14,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Add Distance from Headquarter for Each Location

In [26]:
import math # library to handle math operations

In [27]:
# Define function to find the distance between starbucks location and headquarter
def find_distance_to_HQ(row, HQ_LAT = 47.580700, HQ_LNG = -122.336000) :
    
    # Assign latitude and longitude variables
    lat = row['Latitude']
    lng = row['Longitude']
    
    # Find horizontal and vertical distances
    horizontal_distance = HQ_LAT - lat
    vertical_distance = HQ_LNG - lng
    
    # Return overall distance
    return math.sqrt(horizontal_distance**2 + vertical_distance**2)

In [28]:
# Add column with distance between location and Starbucks HQ
starbucks_locations['Distance to HQ'] = starbucks_locations.apply(find_distance_to_HQ, axis = 1)
starbucks_locations.head()

Unnamed: 0_level_0,City,Country,Longitude,Latitude,Area,Density,ATM,Accessories Store,Acupuncturist,Adult Boutique,...,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit,Distance to HQ
Store Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3513-125945,Anchorage,US,-149.78,61.21,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.64198
74352-84449,Anchorage,US,-149.84,61.14,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.664713
12449-152385,Anchorage,US,-149.85,61.11,4420.1,68,0.0,0.0,0.0,0.0,...,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.660433
24936-233524,Anchorage,US,-149.89,61.13,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.705153
8973-85630,Anchorage,US,-149.86,61.14,4420.1,68,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.682653


### Find Ratings for 500 Locations

In [29]:
# Define function to find distance between location and venue
def find_distance_location_venue(lat_location, lng_location, lat_venue, lng_venue) :
    
    # Find horizontal and vertical distances
    horizontal_distance = lat_venue - lat_location
    vertical_distance = lng_venue - lng_location
    
    # Return overall distance
    return math.sqrt(horizontal_distance**2 + vertical_distance**2)

In [30]:
# Define fuction to retrieve the Foursquare ID of the Starbucks location
def find_ID(row, radius=500) :
    
    # Get latitude and longitude of location
    lat = row['Latitude']
    lng = row['Longitude']
    
    # Create an empty list of venues
    venues_list=[]
        
    try : # Try to make an API request for nearby venues
        
        # Create the API request URL for nearby venues
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        # Make the GET request for nearby venues
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
    except : # Return 'Request failed' if error occurs
        return 'Request failed'
        
    try : # Try to store the relevant information for each nearby venue in a dataframe
        
        # Return only relevant information for each nearby venue
        venues_list.append([(
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['id'],
            find_distance_location_venue(lat, lng, v['venue']['location']['lat'], v['venue']['location']['lng'])) for v in results])
    
        # Create dataframe with close venues to each Starbucks
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = [ 'Venue', 
                                  'Venue Latitude', 
                                  'Venue Longitude', 
                                  'Venue ID',
                                  'Distance']
        
    except : # Return 'Dataframe creation failed' if error occurs
        return 'Dataframe creation failed'
    
    try : # Try to find the closest Starbucks to the location
        
        # Find all Starbucks in a radius of 500m
        nearby_venues = nearby_venues[nearby_venues['Venue'] == 'Starbucks']

        # Keep only the closest Starbucks and extract its ID
        starbucks_id_series = nearby_venues[nearby_venues['Distance'] == nearby_venues['Distance'].min()]['Venue ID']
        starbucks_id = starbucks_id_series[starbucks_id_series.index[0]]
    
    except : # Return 'Starbucks not found' if error occurs
        return 'Starbucks not found'
    
    # If everything worked, return the Foursquare ID for the Starbucks location
    return starbucks_id

In [31]:
# Create column with the Foursquare ID of each Starbucks location found
starbucks_locations['Foursquare ID'] = starbucks_locations.apply(find_ID, axis = 1)

In [32]:
# Print how many errors occured for each type
print('Request errors:', starbucks_locations[starbucks_locations['Foursquare ID'] == 'Request failed']['Foursquare ID'].count())
print('Starbucks not found:', starbucks_locations[starbucks_locations['Foursquare ID'] == 'Starbucks not found']['Foursquare ID'].count())

# Print how many successes occured
print('Starbucks ID found:', starbucks_locations[(starbucks_locations['Foursquare ID'] != 'Starbucks not found') & (starbucks_locations['Foursquare ID'] != 'Request failed')]['Foursquare ID'].count())

Request errors: 2
Starbucks not found: 2019
Starbucks ID found: 3807


In [33]:
# Remove Starbucks locations with missing IDs
starbucks_locations = starbucks_locations[(starbucks_locations['Foursquare ID'] != 'Starbucks not found') & (starbucks_locations['Foursquare ID'] != 'Request failed')]

In [34]:
# Import random library
import random

In [35]:
# Reset index
starbucks_locations.reset_index(inplace = True)

In [36]:
# Create list of random indexes
random_indexes = []

# Find 500 unique random index numbers
for i in range(500) :
    
    # Find random integer
    index = random.randint(0, starbucks_locations.shape[0])
    
    # Find a new random integer until it is unique
    while index in random_indexes :
        index = random.randint(0, starbucks_locations.shape[0])
    
    # Add random index to list
    random_indexes.append(index)

# Create dataframe with 500 random locations
random_starbucks_locations = starbucks_locations.iloc[random_indexes]

random_starbucks_locations.info()
random_starbucks_locations.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 500 entries, 1086 to 306
Columns: 623 entries, Store Number to Foursquare ID
dtypes: float64(617), object(6)
memory usage: 2.4+ MB


Unnamed: 0,Store Number,City,Country,Longitude,Latitude,Area,Density,ATM,Accessories Store,Acupuncturist,...,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit,Distance to HQ,Foursquare ID
1086,8750-94815,San Diego,US,-116.97,32.63,842.3,1670,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.884501,4b169d71f964a520f2ba23e3
1862,72486-20060,Dallas,US,-97.04,32.89,882.9,1493,0.0,0.027027,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29.252424,54dca7d5498ee741d06aea01
2660,72575-103057,Bellevue,US,-122.15,47.6,86.8,1630,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186999,4b667e10f964a52091222be3
2711,79491-106608,Omaha,US,-95.99,41.24,345.0,1296,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27.098269,4ba3c0c7f964a520775b38e3
3703,14247-137176,Seattle,US,-122.33,47.57,217.0,3245,0.0,0.0,0.0,...,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012267,49e7b460f964a5200b651fe3


In [37]:
# Define function to retrieve the locations's rating
def find_Rating(starbucks_id):
    
    try : # Try to make an API request for details on this Starbucks
        
        # Create the API request URL for details on this Starbucks
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            starbucks_id,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            )

        # Make the GET request for details on this Starbucks
        results = requests.get(url).json()['response']['venue']
        
    except : # Return 'Request failed' if error occurs
        return 'Request failed'
    
    try : # Try to find the rating and rating count of this Starbucks
        
        # Get rating and rating count of this Starbucks
        rating = results['rating']
        rating_count = results['ratingSignals']
        
    except : # Return 'Rating error' if error occurs
        return 'Rating error'
    
    # Reject rating if less than 10 individual ratings
    if rating_count < 10 :
        return 'Not enough ratings'
    
    # Return this Starbucks' rating
    return rating

In [38]:
# Create column with the rating for each Starbucks location
random_starbucks_locations['Rating'] = random_starbucks_locations['Foursquare ID'].apply(find_Rating)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [39]:
# Print how many errors occured for each type
print('Request errors:', random_starbucks_locations[random_starbucks_locations['Rating'] == 'Request failed']['Rating'].count())
print('Rating errors:', random_starbucks_locations[random_starbucks_locations['Rating'] == 'Rating error']['Rating'].count())
print('Not enough ratings:', random_starbucks_locations[random_starbucks_locations['Rating'] == 'Not enough ratings']['Rating'].count())

# Print how many successes occured
print('Starbucks ratings found:', random_starbucks_locations[(random_starbucks_locations['Rating'] != 'Request failed') & (random_starbucks_locations['Rating'] != 'Rating error') & (random_starbucks_locations['Rating'] != 'Not enough ratings')]['Rating'].count())

Request errors: 0
Rating errors: 2
Not enough ratings: 114
Starbucks ratings found: 384


In [40]:
random_starbucks_locations.head()

Unnamed: 0,Store Number,City,Country,Longitude,Latitude,Area,Density,ATM,Accessories Store,Acupuncturist,...,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit,Distance to HQ,Foursquare ID,Rating
1086,8750-94815,San Diego,US,-116.97,32.63,842.3,1670,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15.884501,4b169d71f964a520f2ba23e3,7.5
1862,72486-20060,Dallas,US,-97.04,32.89,882.9,1493,0.0,0.027027,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29.252424,54dca7d5498ee741d06aea01,8.2
2660,72575-103057,Bellevue,US,-122.15,47.6,86.8,1630,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186999,4b667e10f964a52091222be3,5.7
2711,79491-106608,Omaha,US,-95.99,41.24,345.0,1296,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27.098269,4ba3c0c7f964a520775b38e3,Not enough ratings
3703,14247-137176,Seattle,US,-122.33,47.57,217.0,3245,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012267,49e7b460f964a5200b651fe3,6.7


In [41]:
# Remove Starbucks locations with missing ratings
starbucks_ratings = random_starbucks_locations[(random_starbucks_locations['Rating'] != 'Request failed') & (random_starbucks_locations['Rating'] != 'Rating error') & (random_starbucks_locations['Rating'] != 'Not enough ratings')]

In [42]:
import folium # Import library to create maps

In [43]:
# Create map Starbucks locations with ratings

# Create simple map of the US
ratings_map = folium.Map(location=[40.001345, -99.092845], zoom_start=4.0)

# Add markers to the map
for lat, lon in zip(starbucks_ratings['Latitude'], starbucks_ratings['Longitude']):
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7).add_to(ratings_map)

# Display map
ratings_map

In [46]:
# Export dataset to CSV to continue with EDA and predictive modeling
starbucks_ratings.to_csv('starbucks_ratings.csv')

*Note: This notebook only represents the first half of the project. It was used to generate the dataset that will be used for the exploratory data analysis and the modeling. Since this dataset had a random component to it, I decided it was wise to keep the values constant during the development part of the second part. The second notebook can be found <a href = "https://github.com/alexis-raymond/Coursera_Data-Science_Capstone">here</a>.*