# Consumer Store Location Data Scrape

This notebook utilizes Splinter and BeautifulSoup to scrape the websites of the following popular consumer chains and pull out their store location data:

* Starbuck's
* McDonald's
* Target
* Walgreen's
* Trader Joe's
* Safeway
* Nordstrom's

Author: Cognitech LLC

### Setup and Dependencies

In [1]:
# Import necessary dependencies
from splinter import Browser
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
# determine the path to the Google Chrome driver and store it in a variable
chrome_driver_path = !which chromedriver

In [3]:
# Set the executable path and initialize the chrome browser in splinter
executable_path = {'executable_path': chrome_driver_path[0]}
chrome_browser = Browser('chrome', **executable_path)

### 1) Starbuck's

In [43]:
# loop through the first 500 Starbucks stores to store the coordinates in a list
sb_coordinate_list = []
for num in range(1, 501):
    try:
        # Assign URL variables to visit the Starbuck's store locator website
        # noticed that the store locations can be found using the link below
        # will iterate through these to get all location data
        starbucks_url = 'https://www.starbucks.com/store-locator/store/{0}'.format(num)
        # have Chrome navigate to that URL
        chrome_browser.visit(starbucks_url)

        # Convert the browser html from the Starbuck's website from the link above to a Beautiful Soup object
        starbucks_html = chrome_browser.html
        starbucks_beautiful_soup = BeautifulSoup(starbucks_html, 'html.parser')

        # based on review of the HTML it appears the JSON with the longitude and latitude of a
        # <script> tag within a 'coordinates' key
        sb_script_elements = starbucks_beautiful_soup.find_all("script")

        # loop through all of the <script> tags to pull out the JSON with the coordinates of the store
        for element in sb_script_elements:        
            if 'window.__BOOTSTRAP' in element.get_text():
                sb_coordinates = element.get_text().split('coordinates')[1].split('}')[0]
                sb_latitude = sb_coordinates.split(':')[2].split(',')[0]
                sb_longitude = sb_coordinates.split(':')[3]
                sb_store_name = element.get_text().split('stores')[1].split('name')[1].split(',')[0].split(':')[1]
                break

        # store the coordinates in the list within a dictionary
        sb_store_dictionary = {'store_name': sb_store_name, 
                               'lat': sb_latitude, 'lng': sb_longitude}

        sb_coordinate_list.append(sb_store_dictionary)
    except:
        print('No store number {0}.'.format(num))

No store number 2.
No store number 3.
No store number 4.
No store number 5.
No store number 7.
No store number 9.
No store number 10.
No store number 11.
No store number 12.
No store number 14.
No store number 15.
No store number 16.
No store number 18.
No store number 19.
No store number 20.
No store number 21.
No store number 22.
No store number 24.
No store number 26.
No store number 27.
No store number 28.
No store number 29.
No store number 30.
No store number 31.
No store number 32.
No store number 33.
No store number 35.
No store number 36.
No store number 38.
No store number 40.
No store number 43.
No store number 46.
No store number 47.
No store number 48.
No store number 53.
No store number 54.
No store number 55.
No store number 56.
No store number 57.
No store number 58.
No store number 59.
No store number 66.
No store number 67.
No store number 68.
No store number 69.
No store number 77.
No store number 78.
No store number 79.
No store number 81.
No store number 84.
No sto

In [46]:
# store the coordinates of stores found in a csv
sb_coordinate_list_pd = pd.DataFrame(sb_coordinate_list)
# store this in a csv for the Flask app
sb_coordinate_list_pd.to_csv('../app/data/starbucks_coordinates.csv')

### 2) McDonald's

In [None]:
mcdonalds_url = 'https://www.mcdonalds.com/us/en-us/restaurant-locator.html'

        # have Chrome navigate to that URL
        chrome_browser.visit(starbucks_url)