# Starting Point - DRAFT
This the starting point for single Jupyter Notebook
 
 
## Dependency
### 1 barnum Library
The barnum python library is used to generate a random list of zip codes.  It is install with the following:
  'pip install barnum'
  
### 2 secrets.py
The collection of restaurant data uses the Yelp API that requires a key. This key is stored within the secrets.py that contains a single variable, "yelpKey".

# 1 Configuration Settings
Contains the configurable settings for running the anlaysis.

In [16]:
#-- Import Libraries
import pandas as pd
import os
import math
import barnum
import requests


# Yelp API key in secrets.py; .gitignore prevents the secrets.py from being pushed to GitHub
from secrets import yelpKey


#-- Configuration Settings

#- Common Settings
# Name of the column that contains the zip code information
zipcodeColumnName = "Zipcode"

# Folder that is to contain output of different processing
outputDirectory = "AnalysisData"


#- Random Zipcodes
# Number of zip codes to gather data for; use 100 for production and 3 for testing
numRandomZipcodes =  3

# Name of the file that contains the DataFrame of the random zipcodes
randomZipcodesFileName = "randomZipcode.csv"


#- Collect Yelp Datasets
# Yelp search radius; used with API call
yelpSearchRadius = 3000

# TRUE- use the file of the random zipcodes for yelp dataset FALSE- use the DataFrame in memory
useFileForYelpSearch = False

# Name of the file that contains DataFrame of all data returned from the Yelp API calls
yelpDataFileName = "yelpData.csv"

# Name of the file that contains DataFrame of the zipcode and counts return from Yelp API calls
yelpDataSummaryFileName = "YelpSummaryData.csv"


# 1 Create Random Zip Codes   
This step creates a list of the random zip codes within the study area of Southern California. The barnum library is used to generate the random list. The random zip codes are verified to ensure that they are within the study area and then stored to disk.

In [17]:
#-- Create Random Zipcodes

#- Create List
hasAllZipcodes = False
randomZipcodes = []

while hasAllZipcodes == False:
    theZipCode = barnum.create_city_state_zip()[0]
    numZipCode = int(theZipCode)
    
    if (numZipCode >= 90001) and (numZipCode <= 93005):
        randomZipcodes.append(theZipCode)
        
    if (numRandomZipcodes == len(randomZipcodes)):
        hasAllZipcodes = True 

        
#- Create DataFrame
randomZipcodes_df = pd.DataFrame(randomZipcodes)

randomZipcodes_df.columns = [zipcodeColumnName]


#- Save to Disk
randomZipcodesPath = os.path.join(".", outputDirectory, randomZipcodesFileName)

randomZipcodes_df.to_csv(randomZipcodesPath)


#- Preview Random Zipcodes
randomZipcodes_df.head()

Unnamed: 0,Zipcode
0,92239
1,90277
2,90307


# 2 Collect Yelp Data
For each of the random zipcodes the Yelp API is used to collect two datasets using the Business Search API (https://www.yelp.com/developers/documentation/v3/business_search).  

* All of the businesses that filtered for "restaurant" and "gluten_free"
** Stored data within individual file in output directory; named YelpData_<Zipcode>.csv
* All of the businesses that are filtered for only "restaurant"
** Stored data within individual file in output directory
 
This data collection uses a number of functions.

In [24]:
def getDataForZipcode(isGlutenFreeSearch, searchZipCode):
    ''' Searches the Yelp API to get the business that satisfy the filter
    
    Accepts : isGlutenFreeSearcj (bool) TRUE- search for gluten free term FALSE- search just for restaurant
                searchZipCode (str) zip code to search for records within
    
    Returns : (dictionary) contains information of business for the zip code
                ID: (str) Unique Yelp ID for the business
                Name: (str) Name of the business
                Zipcode: (str) Location of the business
                Latitude: (num) coordinate of the business location
                Longitude: (num) coordinate of the business location
                Price: (str) Price level of the business. Value is one of $, $$, $$$, $$$$ and NA
                Rating: (num) Rating for this business (value ranges from 1, 1.5, ... 4.5, 5)
                IsGlutenFree: (num) 1 - used with the gluten free filter 0 not used with gluten free search
    '''
    
    #- Prepare Search
    # Source Url
    baseYelpUrl = "https://api.yelp.com/v3/businesses/search"

    # API Key passed through header
    headers = {
            'Authorization': 'Bearer %s' % yelpKey,
    }
    
    # Search Term
    searchTerm = 'restaurant'
    
    if (isGlutenFreeSearch == True):
        searchTerm = 'gluten_free,restaurant'
    
    
    # Dictionary stores data
    yelpData = {
        'ID': [],
        'Name': [],
        'Zipcode': [],
        'Latitude': [],
        'Longitude': [],
        'Price' : [],
        'Rating' : [],
        'IsGlutenFree': [],
    }
    
    
    #- Search
    #  API limits 50 records being returned at once; must loop and request offset of results to get all records
    recordLimit = 50
    currentOffset = 0
    hasMoreData = True
    
    while hasMoreData == True:
        
        #- Prepare Parameters
        parameters = {
            'location': searchZipCode,
            'term': searchTerm,
            'limit': recordLimit,
            'radius': yelpSearchRadius,
            'offset': currentOffset,
            }
        
        #- Request
        print(f"  Requesting data. Offset: {currentOffset}")
        
        response = requests.request('GET', baseYelpUrl, headers=headers, params=parameters)
        
        
        #- Check Response
        if (response.status_code == requests.codes.ok):
            
            # Get Json from Response
            responseJson = response.json()
            
            
            # Search Businesses
            for business in responseJson['businesses']:
                
                # Determine Use Business
                useBusiness = checkBusinessForUsage(business, searchZipCode)
                
                if (useBusiness == True):
                    
                    # Populate Dictionary with Business Information
                    yelpData['ID'].append(business['id'])
                    yelpData['Name'].append(business['name'])
                    yelpData['Zipcode'].append(business['location']['zip_code'])
                    
                    yelpData['Latitude'].append(business['coordinates']['latitude'])
                    yelpData['Longitude'].append(business['coordinates']['longitude'])
                    
                    yelpData['Price'].append(getPriceForBusiness(business))
                    yelpData['Rating'].append(business['rating'])
                    
                    # Update search type
                    if (isGlutenFreeSearch == True):
                        yelpData['IsGlutenFree'].append(1)
                    else:
                        yelpData['IsGlutenFree'].append(0)
          
        
        #- Prepare for Next search
        # API only supports 50 records at a time; must query with offset
        currentOffset = (currentOffset + recordLimit)
        
        if (currentOffset > responseJson['total']):
            print(f"Collected all data. Current Offset: {currentOffset}  Total: {responseJson['total']}")
            hasMoreData = False
    
                  
    #- Metadata on Data
    print(f"Total businesses found: {len(yelpData['ID'])}  Search Term: {searchTerm}")
               
          
    #- Return data from function
    return yelpData

In [8]:
def checkBusinessForUsage(businessInfo, searchZipcode):
    ''' Determines if the business can be used in the Analysis
    
    Accepts : businessInfo (dictionary) contains the metadata for individual business 
                searchZipCode (str) zip code searching for data within
    
    Return : bool TRUE- business meets critera, able to use FALSE- unable to use business
    '''
    
    #- Check Within Search Zipcode
    businessZipCode = businessInfo['location']['zip_code']
    
    if (businessZipCode != searchZipcode):
        return False
    
    
    return True

In [9]:
def getPriceForBusiness(businessInfo):
    ''' Gets the price for a business; not all businesses contain this property within the JSON;
    when not found just uses NA.
    
    Accepts : businessInfo (dictionary) metadata on an individual business
    
    Returns : (num) value from price tag
    '''
    try:
        
        return businessInfo['price']
    
    except:
        return 'NA'

In [25]:
#-- Collect Yelp Datasets

#- Get Random Zipcodes
if (useFileForYelpSearch == True):
    randomZipcodesPath = os.path.join(".", outputDirectory, randomZipcodesFileName)
    
    randomZipcodes_df = pd.read_csv(randomZipcodesPath)

else:
    if (randomZipcodes_df is None):
        raise Exception("Unable to collect Yelp dataset; missing reference to randomZipcodes_df")

        
#- Prepare Variables
zipcodeSummary = {
    zipcodeColumnName: [],
    'Count_GlutenFree': [],
    'Count_Restaurant': []
}

yelpData_df = None
hasFirstYelpData = True


#- Collect Data
for index, row in randomZipcodes_df.iterrows():
    
    #- Get Zipcode
    searchZipcode = str(row[0])
    
    
    #- Message
    print(f"-> Search -> {searchZipcode}")
    
    
    #- Get Data from Yelp: Gluten Free Search
    yelpDataForZipcode = getDataForZipcode(True, searchZipcode)

    # Create DataFrame
    yelpDataForZipcode_df = pd.DataFrame(yelpDataForZipcode)
    
    # Determine number of records
    countYelpDataForZipcode = yelpDataForZipcode_df.shape[0]
    
    
    #- Merge to Master DataFrame
    if (hasFirstYelpData == True):
        hasFirstYelpData = False
        yelpData_df = yelpDataForZipcode_df
    
    else:
        yelpData_df = pd.concat([yelpData_df, yelpDataForZipcode_df])
    
    
    #- Get Data from Yelp: All Restaurants
    yelpDataForZipcode = getDataForZipcode(False, searchZipcode)
    
    # Create DataFrame
    yelpDataForZipcode_df = pd.DataFrame(yelpDataForZipcode)
    
    # Determine number of records
    countYelpDataSummaryForZipcode = yelpDataForZipcode_df.shape[0]
    
    
    #- Merge to master DataFrame
    yelpData_df = pd.concat([yelpData_df, yelpDataForZipcode_df])
    
    
    #- Update Summary
    zipcodeSummary[zipcodeColumnName].append(searchZipcode)
    
    zipcodeSummary['Count_GlutenFree'].append(countYelpDataForZipcode)
    zipcodeSummary['Count_Restaurant'].append(countYelpDataSummaryForZipcode)  
    

#- Message
print(" ")
print("Completed getting data from Yelp API")
    
    
#- Export: Yelp Data
yelpDataFilePath = os.path.join('.', outputDirectory, yelpDataFileName)

yelpData_df.to_csv(yelpDataFilePath)


#- Export: Yelp Summary Data
yepDataSummaryFilePath = os.path.join('.', outputDirectory, yelpDataSummaryFileName)

yelpDataSummary_df = pd.DataFrame(zipcodeSummary)

yelpDataSummary_df.to_csv(yelpDataSummaryFileName)


#- Completed Message
print("Completed export of data")

-> Search -> 92239
  Requesting data. Offset: 0
Collected all data. Current Offset: 50  Total: 0
Total businesses found: 0  Search Term: gluten_free,restaurant
  Requesting data. Offset: 0
Collected all data. Current Offset: 50  Total: 0
Total businesses found: 0  Search Term: restaurant
-> Search -> 90277
  Requesting data. Offset: 0
  Requesting data. Offset: 50
  Requesting data. Offset: 100
Collected all data. Current Offset: 150  Total: 141
Total businesses found: 75  Search Term: gluten_free,restaurant
  Requesting data. Offset: 0
  Requesting data. Offset: 50
  Requesting data. Offset: 100
  Requesting data. Offset: 150
  Requesting data. Offset: 200
  Requesting data. Offset: 250
Collected all data. Current Offset: 300  Total: 255
Total businesses found: 168  Search Term: restaurant
-> Search -> 90307
  Requesting data. Offset: 0
  Requesting data. Offset: 50
Collected all data. Current Offset: 100  Total: 95
Total businesses found: 0  Search Term: gluten_free,restaurant
  Requ

In [26]:
#-- Preview Yelp Data
yelpData_df.head()

Unnamed: 0,ID,Name,Zipcode,Latitude,Longitude,Price,Rating,IsGlutenFree
0,OAXv2Q6qjltKJCWs0CnRBQ,Rakkan Ramen-Redondo Beach,90277,33.83101,-118.38534,$$,4.5,1.0
1,5QyigJ3q76yfuSl-5WU1XA,Wildflower Cafe,90277,33.832359,-118.384704,$$,4.0,1.0
2,z5FvyocYW0621b5vuUy5Ng,Dominique's Kitchen,90277,33.83332,-118.38476,$$,4.5,1.0
3,zbiOS63Unr2fqGXiPgFLpA,Mama Ds Redondo Beach,90277,33.82491,-118.38579,$$,4.5,1.0
4,BIrM17k1ApiS-IR1HTTNjw,Kirari West Bake Shop,90277,33.853006,-118.389888,$$,4.5,1.0


In [27]:
#-- Preview Yelp Summary Data
yelpDataSummary_df.head()

Unnamed: 0,Zipcode,Count_GlutenFree,Count_Restaurant
0,92239,0,0
1,90277,75,168
2,90307,0,0
