# Setting up Foursquare data for analysis 

Setup your access token to foursquare

In [1]:
import foursquare
import json
import pandas as pd
import unicodedata

#ACCESS_TOKEN = ""
#client = foursquare.Foursquare(access_token=ACCESS_TOKEN)

CLIENT_ID = 'YU0GZL10NTN4C4OJ5RIFEVSZXZHSHJNT5D0BIE4MQ5MVKRPL'
CLIENT_SECRET = 'JVHGANJF203WEN3X0MOJJ2DQRKJFNX1TL2H0SRYWMNQIHKOJ'
client = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)

Use a foursquare python library method to search for suitable venues around a city near you. Print the associated JSON output in a nice way with appropriate spacing and indentation

In [2]:
starting_list = client.venues.search(params={'near': 'Seattle, WA', 'radius': 1500})

Wow... that should look like a total mess to you. Read the following docs: https://docs.python.org/2/library/json.html, and read the part about pretty printing. Once you think you've understood the method, deploy it here and see the world a difference a bit of spacing and indenting makes! 

In [3]:
print (json.dumps(starting_list, indent = 4))


{
    "confident": false, 
    "geocode": {
        "parents": [], 
        "what": "", 
        "where": "seattle wa", 
        "feature": {
            "highlightedName": "<b>Seattle</b>, <b>WA</b>, United States", 
            "displayName": "Seattle, WA, United States", 
            "name": "Seattle", 
            "longId": "72057594043737780", 
            "cc": "US", 
            "id": "geonameid:5809844", 
            "geometry": {
                "center": {
                    "lat": 47.60621, 
                    "lng": -122.33207
                }, 
                "bounds": {
                    "sw": {
                        "lat": 47.481719999999996, 
                        "lng": -122.459696
                    }, 
                    "ne": {
                        "lat": 47.734145, 
                        "lng": -122.224433
                    }
                }
            }, 
            "matchedName": "Seattle, WA, United States", 
            "woeType": 7, 
   

Now that we can make some sense of the structure let's practice traversing the JSON hieararchy, select one of the venues in the list and output it's name

In [4]:
starting_list['venues'][3]['name']

u'Kurt Cobain Park'

In [5]:
type(temp['venues'][1]['name'])

NameError: name 'temp' is not defined

Note that the output isn't exactly what we want. It says u'Park', and if you check the type, Python will output Unicode. This isn't good, we need to recover the original intended type. Read the following docs: 

https://docs.python.org/2/library/unicodedata.html, and checkup the method 'normalize'. Once you think you've understood this method. Implement it on the above call and see if you can recover the appropriate type for that data.


In [None]:
temp['venues'][2]['name'].encode('ascii', 'ignore')

Now for some exploratory analysis, let's print the number of total venues in your list

In [None]:
len(temp['venues'])


Extract the location id for your starting list. Make sure it's normalized to its correct type, and not Unicode. Put this id in a variable called temp. From this id, we will get a list of other venues.

In [None]:
temp = starting_list['venues'][2]['id'].encode('ascii', 'ignore')


Print the venues list (in the nicely formatted JSON)

In [None]:
temp1 = client.venues(temp)
print(json.dumps(temp1, indent = 4))

Create a procedure that will only extract the comments in a list. There are a few ways you can do this, but I highly recommend you look up the map method from the base Python library: https://docs.python.org/2/tutorial/datastructures.html

This is the same "map" function, that's one part of the map-reduce duo used in "Big Data" applications. So it may be helpful to get familiar with this method now if that's where you think you may want to take your career in the future. 

In [None]:
map(lambda h: h['text'], temp1['venue']['tips']['groups'][0]['items'])

Now we're going to bring the above mini-tasks together into a nice little method, that will allow us to convert any foursquare JSON data into a nice tabular / rectangular table for further analysis. First instnatiate a pandas data frame.

Write a procedure that will take your list of venues around a certain geography/lat/long whatever, and output a table that will have for each row, a comment associated for the venue (multiple comments will mean multiple rows, each per comment), the venue name, the tip count, the user count, and the store category. Make sure that each column is populated with appropriately typed values, i.e. names/categories should be strings, and numbers should be numerical data type.

**Hint**: Before you begin, think about the process. You're going to start with a loop of some kind, then think about the following:
- How many of those do you need? 
- Think about the JSON structure, how "deep" do you need to penetrate the hierarchy to reach the data you want (this will help you think about how many loops you need for your crawler
- How should you iteratively add on to your Pandas data frame? 
- Think of any tests you may need to put in to ensure your procedure does not cause an error (this may help you figure out how many if statements you may need, and where to place them.


In [None]:
def city_venues(city):
    venues = pd.DataFrame()
    venues_list = client.venues.search(params={'near': city, 'intent': 'browse'})
    for i in range(len(venues_list['venues'])):
        ven_id = venues_list['venues'][i]['id'].encode('ascii', 'ignore')
        ven = client.venues(ven_id)
        mini_df = pd.DataFrame()
        mini_df['Comments'] = map(lambda h: h['text'].encode('ascii', 'ignore'), ven['venue']['tips']['groups'][0]['items'])
        mini_df['Venue_Name'] = venues_list['venues'][i]['name'].encode('ascii', 'ignore')
        mini_df['Tip_Count'] = ven['venue']['stats']['tipCount']
        mini_df['User_Count'] = ven['venue']['stats']['usersCount']
        if (venues_list['venues'][i]['categories']) != []:
            mini_df['Category'] = venues_list['venues'][i]['categories'][0]['name'].encode('ascii', 'ignore')
        else:
            mini_df['Category'] = "No Category"
        venues = pd.concat([venues, mini_df])
    return venues.reset_index(drop=True)


In [None]:
seattle_venues = city_venues('Seattle, WA')

You've done it! You've built a simple crawler that traverses a JSON directory, and you've deposited the results in a nice Pandas data frame. Congratulations! You're now ready for more data-mining in the future, and have just beefed up the **data** part of the data science combination :)

In [None]:
seattle_venues.info()

In [None]:
seattle_venues