# SafeGraph Interview Problem

## About
SafeGraph provided three questions as part of the hiring process for a Technical Product Manager. Below are my answers for the three questions.

You can view a pdf of the questions [here](https://github.com/bootstrapt/safegraph-practice-problems/blob/main/Technical_Product_Manager_API_Written_Interview.pdf).

Viewing notebook:
- Source: [github](https://github.com/bootstrapt/safegraph-practice-problems)
- Notebook w/ code (and : [github](https://github.com/bootstrapt/safegraph-practice-problems/blob/main/SafeGraph%20Interview%20Problem.ipynb)
- Notebook w/o code (just output): [github pages](https://bootstrapt.github.io/safegraph-practice-problems/)

## Table of Contents
- [Prerequisites](#Prerequisites)
- [Question 1](#Question-1)
- [Question 2](#Question-2)
- [Question 3](#Question-3)

## Prerequisites
Load and initialize notebook extensions, install R packages as needed, and load dependencies and setup globals.

In [1]:
# load extensions (once per session)
%load_ext dotenv
%load_ext rpy2.ipython
print('✔ Extensions loaded')

✔ Extensions loaded


In [2]:
# initialize dotenv
%dotenv
print('✔ dotenv initialized')

✔ dotenv initialized


In [3]:
# install R packages, if missing
%R if (!require("ggplot2", quiet=TRUE)) install.packages("ggplot2", repos='http://cran.us.r-project.org', quiet=TRUE)
%R if (!require("httr", quiet=TRUE)) install.packages("httr", quiet=TRUE)
%R if (!require("jsonlite", quiet=TRUE)) install.packages("jsonlite", quiet=TRUE)
print('✔ R packages installed')

✔ R packages installed


In [4]:
# load packages and setup globals

## load environment variables
from os import environ
SAFEGRAPH_KEY = environ.get('SAFEGRAPH_KEY')

## setup safegraph client from their official docs
import safegraphql.client as sgql
SAFEGRAPH_CLIENT = sgql.HTTP_Client(apikey = SAFEGRAPH_KEY)
SAFEGRAPH_URL = 'https://api.safegraph.com/v1/graphql'

## handling http requests
import requests
import json

## for pretty json output
from pprint import pprint

print('✔ python dependencies loaded')

✔ python dependencies loaded


## Question 1 

### Question
You are going to launch a new API meant for data science users and you want to have at least one client library ready at launch. Do you build a client in R, Python or both? How do you decide?

### Answer
That depends on the expected behavior and knowledge of the user base.

Initially I had these questions:
1. What percentage is expected to use R vs Python? 
    1. We should be able to put together an estimate with current usage data and the expected profile of the new API users. 
    2. If they are the same, then which group uses any current client libraries vs raw API calls?
2. Do more R users know Python or the other way around? 
    1. If there is more familiarity with one over the other, the language with the most reach should be favored.
3.  What are the common use cases?
    1. If they are mostly basic data retrieval, it might make sense to roll out MVP in both that supports commonly used GET requests.
4. What are the major friction points?
    1. Usually things like auth, batching, pagination, etc. Removing friction should be the top priority.
    
Eventually I decided to just implement a basic call myself. I tried the query used in the cURL version of the directions, but something about the JSON encoding wasn't working correctly. I saw that the query in the cURL example looked like a serialized version of the GraphQL query, so I tried using the GraphQL query as a multiline string while using Python's standard json library to convert (see [FIG 1A](#Figure-1A:-Basic-SafeGraph-API-request-in-Python)). Success! I also implemented it using the provided Python client library  (see [FIG 1B](#Figure-1B:-Basic-SafeGraph-API-request-using-official-Python-client-library)). And I implemented it in R, cheating a bit by reusing the query string generated by Python (see [FIG 1C](#Figure-1C:-Basic-SafeGraph-API-request-in-R)). None of the responses were exactly the same, but the data itself matched in all three cases!

Given what I know, I would still want the answers to the above questions, but now I would err towards having a more complete solution in one of the given languages. Authorization isn't too hard so if we can't get a client libary out in time, we should be able to produce docs with common use cases for the other language. I have never used R before and figured out a basic API call in less than an hour.

#### Figure 1A: Basic SafeGraph API request in Python
Based on the `cURL` version of the [docs](https://docs.safegraph.com/reference#lookup-placekey). It uses Python to encode the query string. Authorization handled with a simple header. 

In [5]:
# FIG 1A: Basic SafeGraph API request in Python

querystring = '''
query {
  lookup(placekey: "222-224@5vg-7gr-6kz") {
    safegraph_core {
      placekey
      latitude
      longitude
      street_address
      city
      region
      postal_code
      iso_country_code
      parent_placekey
      location_name
      safegraph_brand_ids
      brands {
        brand_id
        brand_name
      }
      top_category
      sub_category
      naics_code
      phone_number
      open_hours
      category_tags
      opened_on
      closed_on
      tracking_closed_since
      geometry_type
    }
  }
}'''
payload = {'query': querystring}
payload_enc = json.dumps(payload)
headers = {
    'apikey': SAFEGRAPH_KEY,
    'content-type': 'application/json'
}
response = requests.request('POST', 
                            SAFEGRAPH_URL,
                            headers=headers,
                            data=payload_enc)
pprint(response.json())

{'data': {'lookup': {'safegraph_core': {'brands': [{'brand_id': 'SG_BRAND_f116acfe9147494063e58da666d1d57e',
                                                    'brand_name': 'starbucks '
                                                                  'coffee'}],
                                        'category_tags': ['Snacks',
                                                          'Counter Service',
                                                          'Dessert',
                                                          'Tea House',
                                                          'Coffee Shop',
                                                          'Bakery'],
                                        'city': 'San Francisco',
                                        'closed_on': None,
                                        'geometry_type': 'POLYGON',
                                        'iso_country_code': 'US',
                                        'latitude

#### Figure 1B: Basic SafeGraph API request using official Python client library
Based on the `python` version of the [docs](https://docs.safegraph.com/reference#lookup-placekey). 

In [6]:
# FIG 1B: Basic SafeGraph API request using official Python client library

cols = [
    'latitude',
    'longitude',
    'street_address',
    'city',
    'region',
    'postal_code',
    'iso_country_code',
    'parent_placekey',
    'location_name',
    'safegraph_brand_ids',
    'brands',
    'top_category',
    'sub_category',
    'naics_code',
    'phone_number',
    'open_hours',
    'category_tags',
    'opened_on',
    'closed_on',
    'tracking_closed_since',
    'geometry_type',
]
space_needle = SAFEGRAPH_CLIENT.lookup(product   = 'core', 
                                       placekeys = '222-224@5vg-7gr-6kz', 
                                       columns   = cols)
pprint(space_needle.to_dict())

{'brands': {0: [{'brand_id': 'SG_BRAND_f116acfe9147494063e58da666d1d57e',
                 'brand_name': 'starbucks coffee'}]},
 'category_tags': {0: ['Snacks',
                       'Counter Service',
                       'Dessert',
                       'Tea House',
                       'Coffee Shop',
                       'Bakery']},
 'city': {0: 'San Francisco'},
 'closed_on': {0: None},
 'geometry_type': {0: 'POLYGON'},
 'iso_country_code': {0: 'US'},
 'latitude': {0: 37.769035},
 'location_name': {0: 'Starbucks'},
 'longitude': {0: -122.42775},
 'naics_code': {0: 722515},
 'open_hours': {0: '{ "Mon": [["5:30", "19:30"]], "Tue": [["5:30", "19:30"]], '
                   '"Wed": [["5:30", "19:30"]], "Thu": [["5:30", "19:30"]], '
                   '"Fri": [["5:30", "19:30"]], "Sat": [["5:30", "19:30"]], '
                   '"Sun": [["5:30", "19:30"]] }'},
 'opened_on': {0: None},
 'parent_placekey': {0: '222-226@5vg-7gr-6kz'},
 'phone_number': {0: None},
 'placekey': {0: '2

#### Figure 1C: Basic SafeGraph API request in R
Based on the `python` and `cURL` versions of the [docs](https://docs.safegraph.com/reference#lookup-placekey). Since the payload encoded and serialized by Python already works, I just pass that in along with my API key. I then use R's `httr` package to form a valid request

In [7]:
%%R -i SAFEGRAPH_KEY -i payload_enc

# FIG 1C: Basic SafeGraph API request in R

library(httr)

r <- POST('https://api.safegraph.com/v1/graphql',
          add_headers('apikey'=SAFEGRAPH_KEY, 
                      'content-type'='application/json'),
          body = payload_enc,
          encode = 'json')
r
str(content(r, 'parsed'))

List of 2
 $ data      :List of 1
  ..$ lookup:List of 1
  .. ..$ safegraph_core:List of 22
  .. .. ..$ placekey             : chr "222-224@5vg-7gr-6kz"
  .. .. ..$ latitude             : num 37.8
  .. .. ..$ longitude            : num -122
  .. .. ..$ street_address       : chr "2020 Market St"
  .. .. ..$ city                 : chr "San Francisco"
  .. .. ..$ region               : chr "CA"
  .. .. ..$ postal_code          : chr "94114"
  .. .. ..$ iso_country_code     : chr "US"
  .. .. ..$ parent_placekey      : chr "222-226@5vg-7gr-6kz"
  .. .. ..$ location_name        : chr "Starbucks"
  .. .. ..$ safegraph_brand_ids  : chr "SG_BRAND_f116acfe9147494063e58da666d1d57e"
  .. .. ..$ brands               :List of 1
  .. .. .. ..$ :List of 2
  .. .. .. .. ..$ brand_id  : chr "SG_BRAND_f116acfe9147494063e58da666d1d57e"
  .. .. .. .. ..$ brand_name: chr "starbucks coffee"
  .. .. ..$ top_category         : chr "Restaurants and Other Eating Places"
  .. .. ..$ sub_category         : chr "

## Question 2

### Question
In the first iteration of an API, the engineer creates a response that looks like this:
![payload%20snippet.png](img/payload_snippet.png)
You notice that there is both a “safegraph_brand_ids” field and a “brands” field. Do you keep both? If not, which one do you keep? How do you decide?

### Answer
> “Don't ever take a fence down until you know the reason it was put up.”  
> ― G. K. Chesterton

Again, my approach would be to gather a little more data before making a decision. Below I outline my thinking and approach.

Main questions:
1. What are the use cases the developer believes they were solving? Are there other ways to solve it that save us complexity?
    1. I.e. if it’s an index, the customer can compute those. Or maybe store a list of tuples. Or as a dict where the key is the brand name and the value is the id (or vice versa if brand name is not unique).
2. What is the ongoing maintenance cost of including both vs one?
    1. If it’s just a convenience, but it’s cheap and helps with overall satisfaction (and therefore retention and likelihood for additional sales / upsales) then we might as well keep it.

It looks like this is based on actual data, given the response above. So we should have some users we can talk to, who uses it for what and what does it cost us? Based on those answers we can either: sunset the redundant data with clear docs on other patterns that can solve the same problems -or- keep them both but update the docs with clear use cases and what to do if the data ever doesn't match.

## Question 3

### Question
How would you improve this example code snippet in the docs?
![code%20snippet.png](img/code_snippet.png)

### Answer
In order to understand the code a little better, I decided to implement it.

Notes:  
- As written the code does not work, the payload string was cut off in the screenshot ([see FIG 3A](#Figure-3A:-As-written))
- Luckily, it looks like the same string is used in the `cURL` instructions of the docs. But still, we get an error when parsing the payload ([see FIG 3B](#Figure-3B:-Corrected-payload-string))
- We were able to construct a valid query earlier, let's try that method. This time we will use the `Graph QL` portion of the docs ([see FIG 3C](#Figure-3C:-Updating-payload-to-known-good-pattern))
- That works! Let's use the official python API to validate the results. Looks good! ([see FIG 3D](#Figure-3D:-Checking-our-work))

To improve the example snippet ([see FIG 3C](#Figure-3C:-Updating-payload-to-known-good-pattern)):
1. fix the payload using the multiline GraphQL query string literal
2. explicitly use `json.dumps()` to encode the string literal
3. use `response.json()` instead of `response.text` (or use `json.loads(response.text)`)  

Taking the steps above will make the example snippet functional, easier to maintain, and more usable. 
- Functional because the request now works
- Easier to maintain because the query is now readable
- More usable because we're outputting a python dict

Ideally we would not be using string literals to build queries. Debugging can be a pain, which is why I opted for trying a workaround first.

#### Figure 3A: As written
As written, the snippet doesn't work. The payload line was truncated in the screenshot, giving us the error:

In [8]:
# FIG 3A: As written

import requests
import json

url = 'https://api.safegraph.com/v1/graphql'

payload="{\"query\":\"query {\\n\\tlookup(query: {\\n\\t\\tlocation_name: \\\"Taco Bell\\\", \\n\\t\\tstreet_address: \\\"710"
headers = {
    'apikey': SAFEGRAPH_KEY,
    'content-type': 'application/json'
}

response = requests.request('POST', url, headers=headers, data=payload)

print(response.text)

{"error":"Invalid JSON"}


#### Figure 3B: Corrected payload string
In order to send the correct payload, I will replace the one in the screenshot with one from the [docs](https://docs.safegraph.com/reference#lookup-name-address). The string looked suspiciously like the same one in the `cURL` version of the directions, so we can try that.

Still we get an error saying the JSON is invalid.

In [9]:
# FIG 3B: Corrected payload string

payload = '{"query":"query {\n\tlookup(query: {\n\t\t\tlocation_name: \"Taco Bell\", \n\t\t\tstreet_address: \"710 3rd St\", \n\t\t\tcity: \"San Francisco\", \n\t\t\tregion: \"CA\", \n\t\t\tiso_country_code: \"US\"\n\t\t}) { \n\t\tplacekey \n\t\tsafegraph_core {\n\t\t\tlocation_name\n\t\t\tstreet_address\n\t\t\tpostal_code\n\t\t\tphone_number\n\t\t\tcategory_tags\n\t\t}\n\t}\n}","variables":{}}'
headers = {
    'apikey': SAFEGRAPH_KEY,
    'content-type': 'application/json'
}

response = requests.request('POST', SAFEGRAPH_URL, headers=headers, data=payload)

pprint(response.json())

{'error': 'Invalid JSON'}


#### Figure 3C: Updating payload to known good pattern
Above we were able to make an API request using the GraphQL example as a multiline string. This time, we use the text from the `GraphQL` portion of the same [docs](https://docs.safegraph.com/reference#lookup-name-address).

Looks like it works!

In [10]:
# FIG 3C: Updating payload to known good pattern

querystring = '''
query {
  lookup(
    query: {
      location_name: "Taco Bell"
      street_address: "710 3rd St"
      city: "San Francisco"
      region: "CA"
      iso_country_code: "US"
    }
  ) {
    placekey
    safegraph_core {
      location_name
      street_address
      postal_code
      phone_number
      category_tags
    }
  }
}'''
payload = {'query': querystring}
payload_enc= json.dumps(payload)
headers = {
    'apikey': SAFEGRAPH_KEY,
    'content-type': 'application/json'
}
response = requests.request('POST', SAFEGRAPH_URL, headers=headers, data=payload_enc)
pprint(response.json())

{'data': {'lookup': {'placekey': '224-222@5vg-7gv-d7q',
                     'safegraph_core': {'category_tags': ['Counter Service',
                                                          'Late Night',
                                                          'Lunch',
                                                          'Fast Food',
                                                          'Drive Through',
                                                          'Breakfast',
                                                          'Mexican Food',
                                                          'Dinner'],
                                        'location_name': 'Taco Bell',
                                        'phone_number': '+14159791587',
                                        'postal_code': '94107',
                                        'street_address': '710 3rd St'}}},
 'extensions': {'row_count': 1, 'version_date': '1630442778__2021_08'}}


#### Figure 3D: Checking our work
Let's check the response against the same query sent using the `python` section of the [docs](https://docs.safegraph.com/reference#lookup-name-address). 

The formatting is a little different, but the data itself looks correct!

In [11]:
# FIG 3D: Check our work

location_name = 'Taco Bell'
street_address = '710 3rd St'
city = 'San Francisco'
region = 'CA'
iso_country_code = 'US'

cols = [
    'location_name',
    'street_address',
    'postal_code',
    'phone_number',
    'category_tags'
]

r = SAFEGRAPH_CLIENT.lookup_by_name(
    product = 'core', 
    location_name = location_name,
    street_address = street_address,
    city = city,
    region = region,
    iso_country_code = iso_country_code,
    columns = cols
)

pprint(r.to_dict())

{'category_tags': {0: ['Counter Service',
                       'Late Night',
                       'Lunch',
                       'Fast Food',
                       'Drive Through',
                       'Breakfast',
                       'Mexican Food',
                       'Dinner']},
 'location_name': {0: 'Taco Bell'},
 'phone_number': {0: '+14159791587'},
 'placekey': {0: '224-222@5vg-7gv-d7q'},
 'postal_code': {0: '94107'},
 'street_address': {0: '710 3rd St'}}
