# Spark Streaming Four Square Challenge - Stream Reading Notebook
Raj Prasad
July 2019

[html version](https://daddyprasad5.github.io/foursquare_client.html) 

[jupyter notebook version](https://github.com/daddyprasad5/thinkful/blob/master/foursquare_client.ipynb) 

The goal of this exercise is to create a stream of foursquare trending venues by calling the foursquare APIs and storing the output into files in my google drive.  

Then [another notebook](https://github.com/daddyprasad5/thinkful/blob/master/foursquare_streaming_challenge.ipynb) reads that stream and displays data from it - nothing fancy just the print of the dataframe.  

I've got [another interesting version](https://daddyprasad5.github.io/interactive_gmaps_and_four_square.html) of this that uses Jupyter Widgets to allow the notebook user to pick a city, then collects the foursquare trending venues from the foursquare API, and displays the locations of those venues on a google map.   

In [1]:
# Point Colaboratory to Google Drive
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
#imports
import requests
import json
import datetime
from time import sleep

#set constants
LOG_PATH = "/content/gdrive/My Drive/Colab Datasets/foursquare_logs/foursquare_trending"
KEY_FILE = "/content/gdrive/My Drive/Colab Datasets/gookey.txt"


In [0]:
#this is for simulating a continuous stream
#it will drop one file (one city) every 3 seconds for about 12 seconds and then stop

locs = ["New York City", "San Francisco", "Paris", "London"]

#should be "while True" but I don't want to eat up my rate count by accident
for i in range(3):
    for loc in locs:
      trending_api = f"https://api.foursquare.com/v2/venues/trending?near='{loc}'&limit=10&radius=2000&client_id=VF5BPQB0PA5NCWMX0J53UE4IVTGHJSNZ5BJFFFNFPQXSLNFG&client_secret=SNQTNEXKJAHFXA11EXTWUDG13GGL4P2B5JMCPM21XJJOQ3SY&v=20190724"
      req = requests.get(trending_api)
      response = req.json()
      names = []
      locations = []

      #the API occasionally will return an error - seems a bit flaky...
      if response["meta"]["code"] == 400: #error on request - move on to next iteration
        print('got an error...')
        break
      
      for v in response["response"]["venues"]:
        names.append(v["name"])
        locations.append((v["location"]["lat"], v["location"]["lng"]))

      trending_venues = {"city":loc, "names":names, "locations": locations}

      now = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')  
      fname = LOG_PATH+loc+'-'+now+'.json'
      with open(fname, 'w') as f:  # writing JSON object
        json.dump(trending_venues, f)

      with open(fname, 'r') as f: 
        venues = json.load(f)
      sleep(5)
  