# Forsquare API data

In this notebook, we will try to extract the London restaurants data by using Forsquare API.

## Import Libraries

In [1]:
import pandas as pd 
import requests
import numpy as np
import plotly.express as px
import geopandas
import matplotlib.pyplot as plt

## Load data

In [36]:
df = pd.read_csv('Data/London_cleaned_data.csv')

In [3]:
df.head()

Unnamed: 0,Borough,Area (sq mi),Population_2019,Density,Median_Househols_Income,N_employees,latitude,longitude
0,Barking and Dagenham,13.93,212906,5871,21953,57715,51.5607,0.1557
1,Barnet,33.49,395896,4520,34163,134650,51.6252,0.1517
2,Bexley,23.38,248287,4082,29192,78930,51.4549,0.1505
3,Brent,16.7,329771,7652,28847,123260,51.5588,0.2817
4,Bromley,57.97,332336,2205,33659,108250,51.4039,0.0198


In [4]:
df.shape

(32, 8)

## Getting Venue Data with Forsquare API

In [5]:
# Id and secret keys were generated from Forsquare page.
CLIENT_ID = 'JA0UD4KWLUM24COSVJ3LKDXDYVHZYNQO3JGP5NLOLGIOCTNU' 
CLIENT_SECRET = 'MIXEAQFT3WQIQKCE1JATODXR5MLJ1MV5BLYIOUNMXE4CUGHS' 
VERSION = '20220307'
# using the FREE API has to be limited to 50 venues for each time.
LIMIT = 50



### Extract the Restaurants Data by using the Categories_id 

In [6]:
# london center latitude and longitude
lat = 51.50
lng = 0.11


def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    headers = {
        'Accept': 'application/json',
        'Authorization': 'fsq3h2a9Hns91x/XBRQWiSZ4EiwvDroa5ltFP6nxbhF2haU='
    }
    # This categories id 1306 for FOOD from the Forsquare documentation.

    URL = 'https://api.foursquare.com/v3/places/search?ll={},{}&radius={}&limit={}&categories=13065'

    df_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = URL.format(lat, lng, radius, LIMIT)
        results = requests.request("GET", url, headers=headers).json()
        for each_result in results['results']: # filter the result based on JSON identification
            result={}
            result['Borough']=name
            result['Neighborhood Latitude']=lat
            result['Neighborhood Longitude']=lng
            result['Name']=each_result['name']
            try: 
                result['Restaurant Latitude']=each_result['geocodes']['main']['latitude']
                result['Restaurant Longitude']=each_result['geocodes']['main']['longitude']
                # some venues have an NAN latitude & longitude and to avoid the errors, we fill them with ""
            except:
                result['Restaurant Latitude']=""
                result['Restaurant Longitude']=""
            result['Category_Names']=each_result['categories'][0]['name']


            df_list.append(result.copy())
    return pd.DataFrame(df_list) # return dataframe

In [7]:

# call the function
df_result=getNearbyVenues(df['Borough'],df['latitude'],df['longitude'])


In [8]:
df_result.shape

(1188, 7)

We got around 1200 London restaurants data. We will try again by using the query instead of the category_id.

### Extract the London Restaurants Data by using 'query=restaurant'

In [9]:
# The same as the function above but we change the URL 

def getNearbyVenues1(names, latitudes, longitudes, radius=2000):
    headers = {
        'Accept': 'application/json',
        'Authorization': 'fsq3h2a9Hns91x/XBRQWiSZ4EiwvDroa5ltFP6nxbhF2haU='
    }

    URL = 'https://api.foursquare.com/v3/places/search?ll={},{}&radius={}&limit={}&query=restaurant'

    df_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        url = URL.format(lat, lng, radius, LIMIT)
        results = requests.request("GET", url, headers=headers).json()
        for each_result in results['results']: # filter the result based on JSON identification
            result={}
            result['Borough']=name
            result['Neighborhood Latitude']=lat
            result['Neighborhood Longitude']=lng
            result['Name']=each_result['name']
            try:
                result['Restaurant Latitude']=each_result['geocodes']['main']['latitude']
                result['Restaurant Longitude']=each_result['geocodes']['main']['longitude']
            except:
                result['Restaurant Latitude']=""
                result['Restaurant Longitude']=""
            #result['Locality']=each_result['location']['locality']
            result['Category_Names']=each_result['categories'][0]['name']


            df_list.append(result.copy())
    return pd.DataFrame(df_list) # return dataframe

In [10]:

# call the function
df_result1=getNearbyVenues1(df['Borough'],df['latitude'],df['longitude'])


In [11]:
df_result1.shape

(825, 7)

We got around 825 London restaurants data. We will try again by using another query.

Here we want to check if we get venues from all the London boroughs.

In [12]:
df['Borough'].nunique()

32

In [13]:
df_result['Borough'].nunique()

32

In [16]:
# merging the two dataframes 
London = pd.concat([df_result,df_result1], axis=0)


In [17]:
London.head()

Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Name,Restaurant Latitude,Restaurant Longitude,Category_Names
0,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562533,0.147262,Fast Food Restaurant
1,Barking and Dagenham,51.5607,0.1557,The Beacon Tree,51.561391,0.140883,Pub
2,Barking and Dagenham,51.5607,0.1557,Domino's Pizza,51.572111,0.137844,Pizzeria
3,Barking and Dagenham,51.5607,0.1557,Subway,51.568813,0.178686,Fast Food Restaurant
4,Barking and Dagenham,51.5607,0.1557,Millennium Cafe,51.562272,0.147468,Café


In [18]:
# check if there's any duplicated rows
len(London[London.duplicated()])

321

In [19]:
# drop any duplicates values except the first occurrence
London = London.drop_duplicates( keep='first')

In [20]:
London.shape

(1692, 7)

In [21]:
# extract the dataframe 
London.to_csv("Data/London_restaurants.csv", index = False)

## Merge the London Restaurants Dataframe to  London Boroughs Dataframe

In [37]:
# we will merge the dataframes based on the Borough values
df_final = London.merge(df,on='Borough')


In [38]:
df_final.shape

(1692, 14)

In [39]:
# print the columns name after merging 
df_final.columns

Index(['Borough', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Name',
       'Restaurant Latitude', 'Restaurant Longitude', 'Category_Names',
       'Area (sq mi)', 'Population_2019', 'Density', 'Median_Househols_Income',
       'N_employees', 'latitude', 'longitude'],
      dtype='object')

In [40]:
df_final.dtypes

Borough                     object
Neighborhood Latitude      float64
Neighborhood Longitude     float64
Name                        object
Restaurant Latitude         object
Restaurant Longitude        object
Category_Names              object
Area (sq mi)               float64
Population_2019              int64
Density                    float64
Median_Househols_Income    float64
N_employees                float64
latitude                   float64
longitude                  float64
dtype: object

We could notice that, Neighborhood Latitude & Neighborhood Longitude column which has been extract from the Foursquare is similar to the latitude & longitude columns. Thus, we will remove one and keep others.

In [41]:
df_final = df_final.drop(columns=['latitude', 'longitude'])

In [42]:
df_final.head()

Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Name,Restaurant Latitude,Restaurant Longitude,Category_Names,Area (sq mi),Population_2019,Density,Median_Househols_Income,N_employees
0,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562533,0.147262,Fast Food Restaurant,13.93,212906,5.871,21.953,57.715
1,Barking and Dagenham,51.5607,0.1557,The Beacon Tree,51.561391,0.140883,Pub,13.93,212906,5.871,21.953,57.715
2,Barking and Dagenham,51.5607,0.1557,Domino's Pizza,51.572111,0.137844,Pizzeria,13.93,212906,5.871,21.953,57.715
3,Barking and Dagenham,51.5607,0.1557,Subway,51.568813,0.178686,Fast Food Restaurant,13.93,212906,5.871,21.953,57.715
4,Barking and Dagenham,51.5607,0.1557,Millennium Cafe,51.562272,0.147468,Café,13.93,212906,5.871,21.953,57.715


In [43]:
# export the dataframe which contains London's boroughs data and restaurants data.
df_final.to_csv("Data/final_data.csv", index = False)

## Visualising and Understanding the Data

## Bar chart 

In [None]:
# Create a Bar chart to plot each Borough's population

bar1 = london_boroughs.sort_values(by='Population_2019', ascending=False).plot(kind='bar', 
                                                         x='Borough', y='Population_2019', figsize=(15, 3));
# Title and axis labels
bar1.set_title("Borough Population in London (2019)");
bar1.set_xlabel("Borough");
bar1.set_ylabel("Total Population in 2019");



In [None]:
# create bar chart of borough and number of cafes
bar2 = df.Borough.value_counts(ascending=False).plot(kind = 'bar',figsize=(15, 3), color='darkblue', rot= 25, linewidth = 4, edgecolor='white')                                   
# Title and axis labels
bar2.set_title("Borough Population in London (2019)");
bar2.set_xlabel("Borough" );
bar2.set_ylabel("Total Population in 2019");
#counts.sort_values(ascending=False).plot(kind='bar', color='darkblue')
#plt.xticks()

In [None]:
df.Borough.value_counts()

In [None]:
df.Category_Names.value_counts() 

# some of the cafe have different categories - encoding  

# Haven't Finished it yet




- Boroughs that we should consider, based on population count:

- Boroughs that we should consider, based on population density:


## Feature need to be added:
- Density population/Area
- create a list for Competitors  

## Want to achieve 

-  Create a Dataframe with top 5 common cafe types for each neighbourhood "tea room, with a restaurant, so on.. "
- Clustering Neighbourhoods

We will use KMeans algorithm in order to cluster similar London neighbourhoods. Then, explore how many competitors by category. 

Recommendation each boroughs with the category 



In [None]:
df['Borough'] = df['Borough'].str.strip()


In [None]:
df["id"] = df["Borough"].apply(lambda x: state_id_map[x])


In [None]:
df.Borough = df.Borough.astype('string')


In [None]:
state_id_map = {}
for feature in UK["name"]:
    feature["id"] = feature["properties"]["state_code"]
    state_id_map[feature["properties"]["st_nm"]] = feature["id"]




In [None]:
fig = px.choropleth(
    df,
    locations="id",
    geojson= UK,
    color="Population_scale",
    hover_name="Borough",
    hover_data=["Population_scale"],
    title="London Population Density",
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

In [None]:
df.dtypes