StudySpotter -- Perry Fox -- 8/3/2023

References:    
[Article](https://medium.com/p/17e48f8f25b1)  
[Places API overview and docs](https://developers.google.com/maps/documentation/places/web-service/overview)    
[Text Search section](https://developers.google.com/maps/documentation/places/web-service/search-text)  
[Python Requests library](https://realpython.com/python-requests/)
  
[API Credentials Page for key](https://console.cloud.google.com/apis/credentials?project=testing-330118)

# Gathering Candidate Study/Work Spots

## Overview
**Purpose:** This notebook will be used to gather candidate Spot names and addresses using the `Google Places API` for further research and scouting. In the prototype phase, this will just gather cafes in Manhattan, but the functions can be repurposed for a wider scope later on.  
  
**Google Maps API:** To automate the process of a searching for a text string in Google Maps we'll construct a URL to access the Google Maps API. We'll specify that we want to use the Place API, `/place`, with the the text search component, `/textsearch`:       
- `Base URL = https://maps.googleapis.com/maps/api/place/textsearch/`  <br>  

With this, we can append an ambiguous address query (a text string), along with a place type keyword ([full list of keywords](https://developers.google.com/maps/documentation/places/web-service/supported_types)), among other optional parameters, to constrain the results 
  

**API Output:** The API will return a list of up to 60 establishments results across 3 pages, 20 per page, in a JSON object that includes metadata.    
**Costs:**  \\$40 for every 1,000 requests. \\$200 free/month per individual account. About 5,000 free requests before you start paying (each page = 1 request). 
  
**Downstream:** The relevant data extracted from the JSON object will then be fed into our `spots_master_list` Sheets file which will undergo further evaluation. 

## Code
The rest of this notebook will be in two sections: 
- A broken down walkthrough of the functionality so the process can be easily understood and improved upon
- The same code contained in a function that can be repurposed for other areas beyond Manhattan

### Code breakdown

User inputs: 
- zip codes 
- URL parameters (search string, place keywords)

Here, for the purpose of investigation, we'll make the GET request with the following specified parameters:
- search sting: "coffee shop nyc 10011"
- keyword: cafe

---

#### Get data from API

In [1]:
import requests, json
import time # time.sleep(2) required, won't return results if you request all at once with no breaks

apikey = ''

url_root = 'https://maps.googleapis.com/maps/api/place/textsearch/json?'
query = 'coffee+nyc+10011'
type_keyword = 'cafe' # ['bakery','book_store','cafe',]

# key value labels come from the docs. 'key' and 'query' required, 'type' is an optional
params = {
    'key' : apikey,
    'query': query,
    'type': type_keyword,
    'pagetoken': None  # This is where a returned `next_page_token` would go.
}

# making get request
response = requests.get(url_root, params=params)

In [2]:
# confirm the status code, 200 means OK
# Note for error catching: if you use this method in a conditional expression it will evaluate
# to True if 200 <= status code <= 400, else False. Pretty cool.
response.status_code

200

In [None]:
# check out the url we constucted for the request. (Great for debugging!)
response.url

#### Examine and get data from JSON response

In [4]:
# not quite a json object....
type(response)

requests.models.Response

In [5]:
# need to convert type here and then we can start investigating
spots_json = response.json()
spots_json.keys()

dict_keys(['html_attributions', 'results', 'status'])

In [6]:
'next_page_token' in spots_json.keys()

False

In [7]:
spots_json['html_attributions']

[]

In [8]:
npt = spots_json['next_page_token']
npt

KeyError: 'next_page_token'

** This token is then used in a new request. Pass this value to the `pagetoken` parameter of a new request to see the next set of results. **
Notes: 
- setting the pagetoken will cause any other parameters to be ignored. 
- Each request, including using the `next_page_token` counts as a single request. i.e. max 20 results per request!
- There is a delay of a few seconds b/w when a `next_page_token` is issued and when it will be valid (docs don't get more specific than this). Try different time.sleep() values. Not waiting long enough results in an `INVALID_REQUEST` response. 
- !! If there is no next_page_token, the key won't even be present in the .keys() output--This is the point where you check for the omission !!

In [9]:
spots_json['status'] # might want to set up a catch in case a future request is like Gerard Way

'OK'

`results` contains what we're ultimately looking for-- establishment names and addresses..

In [10]:
print(f"Results is a {type(spots_json['results'])} with {len(spots_json['results'])} items") 


Results is a <class 'list'> with 3 items


Each item in this list contains information about one of our search results.

In [11]:
# take a look at an entry
from pprint import pprint
pprint(spots_json['results'][0])

{'business_status': 'OPERATIONAL',
 'formatted_address': '197 E 3rd St, New York, NY 10009, United States',
 'geometry': {'location': {'lat': 40.723004, 'lng': -73.983206},
              'viewport': {'northeast': {'lat': 40.72426757989272,
                                         'lng': -73.98191917010728},
                           'southwest': {'lat': 40.72156792010728,
                                         'lng': -73.98461882989271}}},
 'icon': 'https://maps.gstatic.com/mapfiles/place_api/icons/v1/png_71/shopping-71.png',
 'icon_background_color': '#4B96F3',
 'icon_mask_base_uri': 'https://maps.gstatic.com/mapfiles/place_api/icons/v2/shopping_pinlet',
 'name': 'Book Club',
 'opening_hours': {'open_now': True},
 'photos': [{'height': 6240,
             'html_attributions': ['<a '
                                   'href="https://maps.google.com/maps/contrib/103976492282669338352">Yusuf '
                                   'Esenboga</a>'],
             'photo_reference': 'AUacShhL

In [12]:
spots_json['results'][0].keys()

dict_keys(['business_status', 'formatted_address', 'geometry', 'icon', 'icon_background_color', 'icon_mask_base_uri', 'name', 'opening_hours', 'photos', 'place_id', 'plus_code', 'rating', 'reference', 'types', 'user_ratings_total'])

Now we're zeroing in. Make a list from all the `name` and `formatted_address` fields:

In [13]:
import pandas as pd

In [14]:
place_id = []
name = []
address = []
for i in range(len(spots_json['results'])):
    name.append(spots_json['results'][i]['name'])
    address.append(spots_json['results'][i]['formatted_address'])
    place_id.append(spots_json['results'][i]['place_id'])

In [17]:
df = pd.DataFrame({
              'place_id':place_id,
              'name':name, 
              'address':address})
df

Unnamed: 0,place_id,name,address
0,ChIJwxHj5iBZwokR_pY9e5OhHNs,Book Club,"197 E 3rd St, New York, NY 10009, United States"
1,ChIJSaRjC49ZwokR402Rpxj6i9Q,Housing Works Bookstore,"126 Crosby St, New York, NY 10012, United States"
2,ChIJw2lMFL9ZwokRN70Y5-MQDow,Posman Books,"75 9th Ave, New York, NY 10011, United States"


And there it is! 
Next step is to put this all in one function with some error catching lines.  
Also need a validation function.  
  
Plan is to then make a script that takes user input of list a zip codes, and list of search strings.
- For each zip code, and for each search string, make a request, validate the data and append it to a dataframe.
- At the end of each request, check if Next Page Token is not null, and if so run another request before moving on to the next search string or zip code. 
- Once done with the iterations, the script should append the data of the complete dataframe to the master_list tab in spots_master_list on Google Sheets. 

---