# Data

In this project, only the FourSquare location data will be used. The data will be derived using explore method of places API of FourSquare. Here is an example of how data is collected for city of San Francisco, California, USA.

### Obtaining Data From FourSquare

##### Importing Libraries

In [1]:
import pandas as pd
import requests
import json
import folium

##### Constructing FourSquare url request

In [2]:
CLIENT_ID = 'Nope' # your Foursquare ID
CLIENT_SECRET = 'Nope' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50 # A default Foursquare API limit value


##### Function that returns json data of all the food places in given city.

In [3]:
def getVenuesByCity(cityName):
    url = 'https://api.foursquare.com/v2/venues/explore?near={}&section=food&day=any&time=any&limit={}&client_id={}&client_secret={}&v={}'.format(
        cityName,
        LIMIT,
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION)
    
    results = requests.get(url).json()
    allVenues = results['response']['groups'][0]['items']
    totalResults = results['response']['totalResults']
    OFFSET = 50
    while(OFFSET<totalResults):
        url = 'https://api.foursquare.com/v2/venues/explore?near={}&section=food&day=any&time=any&limit={}&offset={}&client_id={}&client_secret={}&v={}'.format(
            cityName,
            LIMIT,
            OFFSET,
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION)
        OFFSET +=50
        allVenues += requests.get(url).json()['response']['groups'][0]['items']
    return allVenues

In [4]:
sf_Results = getVenuesByCity('San Francisco')
len(sf_Results)

222

##### Locations Visulised
Here you can see all locations on the map as their categories by clicking on it.

In [5]:
Latitude, Longitude = 37.7749, -122.4194 #san francisco coordinates
map_sf = folium.Map(location=[Latitude,Longitude],zoom_start=13)

for i in range(222):
    lat = sf_Results[i]['venue']['location']['lat']
    lng = sf_Results[i]['venue']['location']['lng']
    label = sf_Results[i]['venue']['categories'][0]['name']
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker([lat, lng],
                        popup=label,
                        radius=5,
                        color='#D2691E',
                        fill=True,
                        fill_color='#D2691E',
                        fill_opacity=0.3).add_to(map_sf)
                        
map_sf

### Adding all the Categories to a pandas dataframe

##### Defining Dataframe

In [6]:
df = pd.DataFrame()

##### Getting List and Coordinates of top 40 cities by Population and its coordinates

In [7]:
city_list = pd.read_html("https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population")[4][1:41].drop([0,2,3,4,5,6,7,8,9], axis=1)

In [8]:
cityList = city_list[1].tolist()
citiesList =[]
for item in cityList:
    item = item.split('[')[0]
    citiesList.append(item)

df['City'] = citiesList
df

Unnamed: 0,City
0,New York City
1,Los Angeles
2,Chicago
3,Houston
4,Phoenix
5,Philadelphia
6,San Antonio
7,San Diego
8,Dallas
9,San Jose


##### Function that adds number of restaurants of perticular categories

In [9]:
def addCatsByCity(cityName):
    url = 'https://api.foursquare.com/v2/venues/explore?near={}&section=food&day=any&time=any&limit={}&client_id={}&client_secret={}&v={}'.format(
        cityName,
        LIMIT,
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION)
    
    results = requests.get(url).json()
    firstPage = nextPage = requests.get(url).json()['response']['groups'][0]['items']
    for i in range(50):
        cat = firstPage[i]['venue']['categories'][0]['name']
        if cat in df.columns:
            df.loc[df.City == cityName, cat] += 1
        else:
            df[cat]=[0]*len(df)
            df.loc[df.City == cityName, cat] += 1
    
    totalResults = results['response']['totalResults']
    OFFSET = 50
    while(OFFSET<totalResults):
        url = 'https://api.foursquare.com/v2/venues/explore?near={}&section=food&day=any&time=any&limit={}&offset={}&client_id={}&client_secret={}&v={}'.format(
            cityName,
            LIMIT,
            OFFSET,
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION)
        nextPage = requests.get(url).json()['response']['groups'][0]['items']
        for item in nextPage:
            cat = item['venue']['categories'][0]['name']
            if cat in df.columns:
                df.loc[df.City == cityName, cat] += 1
            else:
                df[cat]=[0]*len(df)
                df.loc[df.City == cityName, cat] += 1
        OFFSET +=50
    return None

In [10]:
for city in citiesList:
    addCatsByCity(city)

In [11]:
df

Unnamed: 0,City,Taco Place,Bakery,Restaurant,Thai Restaurant,Sandwich Place,Seafood Restaurant,Israeli Restaurant,Steakhouse,American Restaurant,...,Portuguese Restaurant,Chaat Place,Austrian Restaurant,Belgian Restaurant,Indonesian Restaurant,Brasserie,Romanian Restaurant,Buffet,Andhra Restaurant,Fish & Chips Shop
0,New York City,4,14,3,9,8,8,1,2,11,...,0,0,0,0,0,0,0,0,0,0
1,Los Angeles,7,12,8,0,13,7,0,4,15,...,0,0,0,0,0,0,0,0,0,0
2,Chicago,5,7,7,0,9,5,1,6,9,...,0,0,0,0,0,0,0,0,0,0
3,Houston,5,4,3,2,6,6,0,8,14,...,0,0,0,0,0,0,0,0,0,0
4,Phoenix,5,2,4,2,3,4,0,7,12,...,0,0,0,0,0,0,0,0,0,0
5,Philadelphia,3,8,3,2,14,4,3,3,6,...,0,0,0,0,0,0,0,0,0,0
6,San Antonio,1,6,3,3,5,9,0,10,7,...,0,0,0,0,0,0,0,0,0,0
7,San Diego,9,7,2,5,7,20,0,2,12,...,0,0,0,0,0,0,0,0,0,0
8,Dallas,9,6,6,4,4,8,0,12,15,...,0,0,0,0,0,0,0,0,0,0
9,San Jose,3,11,2,2,12,5,0,2,4,...,0,0,0,0,0,0,0,0,0,0


In [12]:
len(df.columns)

115

This was just an example, more cities may be added as the dataframe gets prepared and the list goes on. 

### Description

The final derived form of data is a pandas data frame where...

Row: Cities (A big city may be divided into smaller areas)

Columns: Categories of food places

Each cell: Number of places corrosponding to respective city and category.